CN117392187A

CN117392187A - SAR image change detection method and equipment based on spatial attention model

Info

Publication number: CN117392187A
Application number: CN202311298068.8A
Authority: CN
Inventors: 谢聪; 张先义; 庄龙; 郑昱
Original assignee: CETC 14 Research Institute
Current assignee: CETC 14 Research Institute
Priority date: 2023-10-09
Filing date: 2023-10-09
Publication date: 2024-01-12

Abstract

The invention discloses a SAR image change detection method and device based on a spatial attention model, and belongs to the technical field of image change detection. Selecting SAR images with the same scene and different time periods as the images to be detected as reference images, and filling the periphery of the images to be detected with background so that the images to be detected and the reference images have the same size; extracting HOG features and gray features of a reference image and an image to be detected, and carrying out fast matching calculation on the image to be detected and the reference image in a Fourier domain by utilizing a background perception related filtering model to obtain a filtering response image, namely the image to be detected after image registration; and detecting a change region in the image to be detected after image registration by adopting a Unet classification model with enhanced spatial information. The SAR image change detection method based on the soft-sensing detection can reduce the sensitivity of the detection model to the complex background environment, improve the precision of weak area change detection and achieve the purpose of SAR image change detection under the complex background.

Description

SAR image change detection method and equipment based on spatial attention model

Technical Field

The invention belongs to the technical field of image change detection, and particularly relates to a SAR image change detection method and device based on a spatial attention model.

Background

SAR image change detection can effectively display the change condition of a ground movable target through a series of processes such as image high-precision registration, target change detection and the like, update the environment situation in time, finally form the evaluation of the environment situation, and effectively solve the SAR image information interpretation problems such as low SAR image interpretation accuracy, slow information extraction and the like. The SAR image change detection technology has wide application fields.

The SAR image tracks of different periods are not exactly the same, so the change detection requires registration of the SAR images of different periods. Due to the complex SAR ground imaging environment and high real-time requirement, the requirements on registration are improved:

(1) Accurate registration of images

The accurate registration of the multi-phase SAR image is a precondition for realizing a change detection method, the low registration accuracy can lead to the change detection method to extract a large number of pseudo-change areas due to image dislocation, and the accuracy of the change detection method is reduced. The existing image registration method is affected by inherent speckle noise of SAR images, and the accuracy of feature matching is insufficient, so that important consideration is given to inhibiting the speckle noise when designing the image registration method.

(2) Differential map information loss

The existing SAR image change detection method obtains a differential graph by analyzing the difference of multi-phase SAR images, and analyzes changed and unchanged areas from the differential graph. However, the existing method loses hidden gray scale, texture and semantic information in the original SAR image in the process of generating the differential map, and limits the distinguishing capability of the change detection model.

(3) Detection of false alarm problems with pixel level variation

Areas that have no change in semantics or properties may be falsely detected as changed areas due to changes in pixel gray values, as affected by imaging condition factors such as sensors, background environments, etc. Therefore, the target-level or object-level variation detection method is more suitable than the pixel-level variation detection method for detecting variations in the number and positions of targets.

Disclosure of Invention

The invention aims to provide a SAR image change detection method and equipment based on a spatial attention model, which can reduce the sensitivity of the detection model to a complex background environment, improve the precision of weak area change detection and achieve the purpose of SAR image change detection under the complex background.

Specifically, in one aspect, the present invention provides a method for detecting SAR image change based on a spatial attention model, including:

image matching: selecting SAR images with the same scene and different time periods as the images to be detected as reference images, and performing background filling on the periphery of the images to be detected so that the images to be detected and the reference images have the same size; extracting HOG features and gray features of a reference image and an image to be detected, and carrying out fast matching calculation on the image to be detected and the reference image in a Fourier domain by utilizing a background perception related filtering model to obtain a filtering response image, namely the image to be detected after image registration;

and (3) detecting change: and detecting a change region in the image to be detected after the image registration by adopting a Unet classification model with enhanced spatial information.

Further, the background perceptual relevance filtering model is described in equation 3:

where f is a set of filters, f ^(k+1) Is the (k+1) th characteristic channel, f, of the set of filters ^c Is the c-th characteristic channel of the set of filters, h is another set of filters, h ^(k+1) Is the (k+1) th characteristic channel of the set of filters, h ^c Is the C-th characteristic channel of the set of filters, argmin is a function of the optimal solution, C is the characteristic channel number, C is the number of characteristic channels, x is the input characteristic, x ^c Is the c-th characteristic channelIs represented by a circular convolution, y is the desired output, η is a two-dimensional gaussian distribution, p is a set of background perceptual relevance filters to be learned, p is a predetermined constant ^(k+1) Is the (k+1) th characteristic channel, p, of the set of filters ^(k) Is the kth characteristic channel, p, of the set of filters ^c Is the c-th characteristic channel of the set of filters, w is the weight of the preset characteristic channel.

Further, the background perception correlation filtering model performs dense sampling and learning on the target area and the background area by using a learning model of a background perception correlation filter, and specifically includes:

an initial background perception relevant filtering model is established, a filtering expected response value is designed according to the aspect ratio of a reference image, a learning model of a background perception relevant filter shown in a formula 2 is established, the learning model of the background perception relevant filter is rapidly solved in a Fourier domain by using an ADMM algorithm, an optimal solution is obtained, and the background perception relevant filtering model is obtained:

wherein argmin is a function for solving an optimal solution, C is a characteristic channel number, f is a set of filters, f ^c Is the c-th characteristic channel of the set of filters, x is the input characteristic, x ^c Is the c-th characteristic channel of the input characteristic, which represents a cyclic convolution, represents a per-element product,w _o is a weight map, p is a set of background perceptual relevance filters to be learned and obtained, p ^c Is the c-th characteristic channel, w, of the set of filters _o To p ^c The corresponding target area is given high weight to p ^c The corresponding background region is given a low weight, y is the desired output, and is a two-dimensional gaussian distribution.

Further, the detection of the change region in the image to be detected after the registration of the images by adopting a spatial information enhanced Unet classification model refers to that a Unet network is used as a basic structure, a convolution layer, a batch normalization layer, a rectification linear unit layer and a maximum pooling layer are stacked in layers in an encoder module, and high-dimensional features are extracted from an input reference image and the image to be detected; the pixels at different positions in the image to be detected are endowed with different importance by using a change detection technology based on an attention network, and are classified as changed pixels or unchanged pixels.

Further, the spatial information enhanced Unet classification model is obtained through the following steps:

3-1) defining encoder feature mapping to F ₁ The decoder feature maps to G ₁ The size is H multiplied by W multiplied by C, H is the number of rows of the feature matrix, W is the number of columns of the feature matrix, and C is the number of feature channels;

3-2) converting the encoder and decoder feature maps using a convolution layer with a convolution kernel size of 3×3 and balancing the feature distribution using a batch normalization layer to obtain a convolutionally normalized encoder feature map F ₂ And decoder profile G ₂ ；

3-3) fusing the convolutionally normalized encoder feature map F on each spatially located channel ₂ And decoder profile G ₂ Obtaining a fused characteristic diagram F _fuse Representing the importance of the spatial position, and activating the fused feature map through a ReLU function; f (F) _fuse The definition is as follows:

F _fuse ＝f((B(F ₁ *W ₁ )+B(G ₁ *W ₂ ))*W ₃ )

wherein, is convolution operation, B is block normalization layer, W ₁ And W is ₂ 3 x 3 convolution kernels, W, in encoder and decoder, respectively ₃ For a convolution kernel of 1×1, f is a ReLU function, f (x) =max (0, x), where x is the input feature;

3-4) use a convolution kernel of size 1×1×C with the fused encoder profile F ₂ And decoder profile G ₂ After convolution, inputting a sigmoid function to obtain a weight factor of each spatial position;

3-5) rescaling decoder feature map G with output of sigmoid function ₁ The last output F of the fused spatial attention mechanism module is obtained by the activation amplitude _final The definition is:

further, before the image registration, the method further comprises the step of adopting a non-local sparse model to carry out filtering processing on the reference image and the image to be detected respectively.

Further, the filtering processing of the reference image and the image to be detected by adopting the non-local sparse model comprises the following steps:

1-1) for each pixel point in an image, respectively taking the pixel point as a neighborhood with the central size of N multiplied by N as an image block, N as the pixel point number, taking a range with the pixel point as the central size of M multiplied by M as a corresponding search frame, M as the pixel point number, calculating the Euclidean distance between each pixel point in the corresponding search frame and each pixel point in the image block, and taking the corresponding search frame and the image block as similar image blocks when the distance is smaller than a certain threshold value, so as to obtain a similar set of the pixel points in the image;

1-2) analyzing the structure and the size of the similar set, and constructing a dictionary of sparse representation by using the similar set; when the similarity set is smaller than a set threshold, selecting a dictionary used for sparse representation in the K-SVD method as a dictionary of the similarity set; when the similarity set is larger than a specified threshold, solving a dictionary of the similarity set by combining an SOMP algorithm iterative calculation method;

1-3) according to the dictionary with sparse representation, carrying out sparse decomposition and reconstruction on a similar set of each pixel point in the image by adopting an SOMP algorithm, and solving a sparse coefficient matrix of the image;

1-4) each pixel point in the image comprises a plurality of denoising results, summing all the denoising results of each pixel, taking an average value as the final denoising result of the pixel, and thus obtaining the denoising result of the whole image.

In another aspect, the present invention also provides a SAR image change detection apparatus based on a spatial attention model, the apparatus including a memory and a processor; the memory stores a computer program that is executed by the processor to implement the steps of the method described above.

In yet another aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.

The SAR image change detection method and equipment based on the spatial attention model have the following beneficial effects:

aiming at the inherent speckle noise problem of the SAR image, the non-local sparse model is utilized to reduce and enhance the image, so that the SAR image is more uniform and smooth, and a clear and complete target area can be obtained.

Aiming at the problem of low SAR image matching precision, a background sensing filter learning method is provided, and the method has the characteristics of flicker resistance, noise resistance and the like, and realizes SAR image fast matching.

Aiming at the problem of low SAR image change detection robustness under a complex background in the prior art, a spatial CNN and a spatial attention mechanism are introduced to construct a spatial information enhanced Unet, so that the false alarm brought by the specific speckle noise of the SAR image is reduced, the sensitivity of the model to the complex background environment is reduced, and the model is ensured to have good robustness to various complex background environment SAR images.

Drawings

FIG. 1 is a flow chart of a method of an embodiment of the present invention.

Fig. 2 is a schematic diagram of a SAR image change detection process according to an embodiment of the present invention, in which (a) is a reference image, (b) is an image to be detected, (c) is a target in which the image to be detected disappears compared with the reference image, and (d) is a target in which the image to be detected appears compared with the reference image.

Detailed Description

The invention is described in further detail below with reference to the examples and with reference to the accompanying drawings.

Example 1:

in one embodiment of the invention, two SAR images with typical feature scenes are selected according to experimental data, and a research area is a test field area, wherein the test field area comprises features such as buildings, roads, trees, shadows and the like. And performing a change detection experiment by using two SAR images of the same scene and different time periods, wherein the SAR image of the first time period is used as a reference image, and the SAR image of the current time period is an image to be detected. As shown in fig. 1, the specific steps are as follows:

(1) (optional) image preprocessing

The non-local sparse model is adopted to carry out filtering processing on the reference image (see (a) in fig. 2) and the image to be detected (see (b) in fig. 2), so that the image speckle noise is reduced, and meanwhile, the detailed information such as the image edge, the contour and the like can be kept. The sparse representation realizes image denoising by designing an effective dictionary and reconstructing image signals by using a small amount of information.

The filtering processing of the reference image and the image to be detected by adopting the non-local sparse model comprises the following steps:

(2) Image matching

Selecting SAR images with the same scene and different time periods as the images to be detected as reference images, and performing background filling on the periphery of the images to be detected, so that the images to be detected and the reference images have the same size, and the filled background can be used for learning the capability of distinguishing the background from the target by a background perception related filtering model; extracting HOG (Histogram of Oriented Gradient, direction gradient histogram) characteristics and gray level characteristics of a reference image and an image to be detected, carrying out fast matching calculation on the image to be detected and the reference image in a Fourier domain by utilizing a background perception related filtering model to obtain a filtering response diagram, and finding the maximum response position (for example, by a Newton iteration method) on the filtering response diagram as the matched position to obtain the image to be detected after image registration. After image matching, the reference image and the image to be detected are consistent in geometric space.

Background perceptually relevant filtering model see equation 3:

where f is a set of filters, f ^(k+1) Is the (k+1) th characteristic channel, f, of the set of filters ^c Is the c-th characteristic channel of the set of filters, h is another set of filters, h ^(k+1) Is the (k+1) th characteristic channel of the set of filters, h ^c Is the C-th characteristic channel of the set of filters, argmin is a function of the optimal solution, C is the characteristic channel number, C is the number of characteristic channels, x is the input characteristic, x ^c Is the input feature of the c-th feature channel, representing a cyclic convolution, y isThe expected output is a two-dimensional Gaussian distribution, eta is a preset constant, p is a group of background perception correlation filters to be learned and obtained, and p ^(k+1) Is the (k+1) th characteristic channel, p, of the set of filters ^(k) Is the kth characteristic channel, p, of the set of filters ^c Is the c-th characteristic channel of the set of filters, w is the weight of the preset characteristic channel.

The background perception correlation filtering model performs dense sampling and learning on a target area and a background area by using a learning model of a background perception correlation filter, and specifically comprises the following steps:

and establishing an initial background perception related filtering model of the reference formula 1, and designing a filtering expected response value according to the aspect ratio of the reference image.

Where x represents a cyclic convolution, p represents a per-element product, p is the filter to be learned, p ^c Is the filter of the c-th characteristic channel, w _o Is a weight map given to p ^c The corresponding target area is given high weight, the background area is given low weight, C is the characteristic channel number, x is the input characteristic, x ^c Is the input feature of the c-th feature channel, y is the desired output, and is a two-dimensional gaussian distribution.

Let w _o ·p ^c ＝f ^c Equation 1 can obtain equation 2 through simple transformation, build a learning model of the background perception correlation filter referring to equation 2, and quickly solve the learning model of the background perception correlation filter in the fourier domain by using the ADMM algorithm to obtain an optimal solution, which is the background perception correlation filtering model:

(3) Change detection

Preferably, the spatial CNN and the spatial attention mechanism are introduced into the Unet structure, aiming at the problem of low robustness of SAR image change detection in a complex background in the prior art. Because the Unet network has flexible expandability and robustness, the Unet network takes the Unet network as a basic structure, the encoding and decoding modules of the Unet network are reserved to construct a space information enhanced Unet, and a space information enhanced Unet classification model is adopted to detect a change region in the image to be detected after the image registration.

Detecting a change region in the image to be detected after the image registration by adopting a spatial information enhanced Unet classification model, namely taking a Unet network as a basic structure, layering and stacking a convolution layer, a batch normalization layer (BN), a rectification linear unit layer (ReLU) and a maximum pooling layer (max-pooling) in an encoder module, and extracting high-dimensional features from an input reference image and the image to be detected; the pixels at different positions in the image to be detected are endowed with different importance by using a change detection technology based on an attention network and classified into changed pixels or unchanged pixels, so that false alarms caused by specific speckle noise of the SAR image can be reduced, the sensitivity of the model to complex background environments is reduced, and the model is guaranteed to have good robustness to various complex background environment SAR images, as shown in (c) and (d) in fig. 2.

The Unet classification model with enhanced spatial information is obtained through the following steps:

F _fuse ＝f((B(F ₁ *W ₁ )+B(G ₁ *W ₂ ))*W ₃ )

the spatial attention mechanism module gives different importance to different locations through the steps described above. For example, the trace location weight factor should be larger and the background location weight factor should be smaller. Furthermore, since the attention mechanism in the proposed detection network is composed of the base layer in CNN, such as the convolutional layer and the active layer, the whole forward propagation is minimal, which means that the attention module used can back propagate the loss to the first few layers, thus achieving the whole network end-to-end training and optimization.

Given an image block of size 512 x 512, 512 feature maps of size 32 x 32 will be obtained by the encoder module, and then the receiving domain of the feature maps will be increased by using the space CNN. The decoding module uses the up-sampling layer, the convolution layer and the BN layer to reserve the original size of the input image, so that final classification is realized, and the classification result is that the value of a changed pixel is 1 and the value of an unchanged pixel is 0.

According to the SAR image change detection method and the SAR image change detection device based on the spatial attention model, the noise reduction and the enhancement are carried out on the image by utilizing the non-local sparse model aiming at the inherent speckle noise problem of the SAR image, so that the SAR image is more uniform and smooth, and a clear and complete target area can be obtained. Aiming at the problem of low SAR image matching precision, a background sensing filter learning method is provided, and the method has the characteristics of flicker resistance, noise resistance and the like, and realizes SAR image fast matching. Aiming at the problem of low SAR image change detection robustness under a complex background in the prior art, a spatial CNN and a spatial attention mechanism are introduced to construct a spatial information enhanced Unet, so that the false alarm brought by the specific speckle noise of the SAR image is reduced, the sensitivity of the model to the complex background environment is reduced, and the model is ensured to have good robustness to various complex background environment SAR images.

In some embodiments, certain aspects of the techniques described above may be implemented by one or more processors of a processing system executing software. The software includes one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer-readable storage medium. The software may include instructions and certain data that, when executed by one or more processors, operate the one or more processors to perform one or more aspects of the techniques described above. The non-transitory computer readable storage medium may include, for example, a magnetic or optical disk storage device, a solid state storage device such as flash memory, cache, random Access Memory (RAM), or other non-volatile memory device. Executable instructions stored on a non-transitory computer-readable storage medium may be in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executed by one or more processors.

A computer-readable storage medium may include any storage medium or combination of storage media that can be accessed by a computer system during use to provide instructions and/or data to the computer system. Such storage media may include, but is not limited to, optical media (e.g., compact Disc (CD), digital Versatile Disc (DVD), blu-ray disc), magnetic media (e.g., floppy disk, magnetic tape, or magnetic hard drive), volatile memory (e.g., random Access Memory (RAM) or cache), non-volatile memory (e.g., read Only Memory (ROM) or flash memory), or microelectromechanical system (MEMS) based storage media. The computer-readable storage medium may be embedded in a computing system (e.g., system RAM or ROM), fixedly attached to the computing system (e.g., a magnetic hard drive), removably attached to the computing system (e.g., an optical disk or Universal Serial Bus (USB) based flash memory), or coupled to the computer system via a wired or wireless network (e.g., network-accessible storage (NAS)).

While the invention has been disclosed in terms of preferred embodiments, the embodiments are not intended to limit the invention. Any equivalent changes or modifications can be made without departing from the spirit and scope of the present invention, and are intended to be within the scope of the present invention. The scope of the invention should therefore be determined by the following claims.

Claims

1. A method for detecting SAR image change based on a spatial attention model, comprising:

2. The SAR image variation detection method based on the spatial attention model of claim 1, wherein the background perceptual relevance filtering model is described in equation 3:

where f is a set of filters, f ^(k+1) Is the (k+1) th characteristic channel, f, of the set of filters ^c Is the c-th characteristic channel of the set of filters, h is another set of filters, h ^(k+1) Is the (k+1) th characteristic channel of the set of filters, h ^c Is the C-th characteristic channel of the set of filters, argmin is a function of the optimal solution, C is the characteristic channel number, C is the number of characteristic channels, x is the input characteristic, x ^c Is the input feature of the c-th feature channel, represents a cyclic convolution, y is the desired output, η is a two-dimensional gaussian distribution, p is a set of background perceptual relevance filters to be learned, p is a predetermined constant ^(k+1) Is the (k+1) th characteristic channel, p, of the set of filters ^(k) Is the kth characteristic channel, p, of the set of filters ^c Is the c-th characteristic channel of the set of filters, w is the weight of the preset characteristic channel.

3. The SAR image change detection method based on the spatial attention model according to claim 1, wherein the background perception correlation filter model performs dense sampling and learning acquisition on the target area and the background area by using a learning model of a background perception correlation filter, and specifically comprises:

4. The SAR image change detection method based on spatial attention model according to claim 1, wherein the detection of the change region in the image to be detected after registration of the image by using a spatially information enhanced Unet classification model means that a convolution layer, a batch normalization layer, a rectification linear unit layer and a maximum pooling layer are layered and stacked in an encoder module by using Unet network as a basic structure, and high-dimensional features are extracted from the input reference image and the image to be detected; the pixels at different positions in the image to be detected are endowed with different importance by using a change detection technology based on an attention network, and are classified as changed pixels or unchanged pixels.

5. The SAR image change detection method based on spatial attention model according to claim 4, wherein the spatially information enhanced une classification model is obtained by:

F _fuse ＝f((B(F ₁ *W ₁ )+B(G ₁ *W ₂ ))*W ₃ )

6. the method for detecting changes in SAR image based on a spatial attention model according to claim 1, further comprising filtering the reference image and the image to be detected, respectively, with a non-local sparse model prior to said image registration.

7. The method for detecting SAR image variation based on spatial attention model according to claim 6, wherein said filtering the reference image and the image to be detected with the non-local sparse model respectively comprises:

8. A SAR image change detection device based on a spatial attention model, the device comprising a memory and a processor; the memory stores a computer program, which is executed by the processor to implement the steps of the method according to any of claims 1-7.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-7.