CN115496980A

CN115496980A - Remote sensing image tampered target detection method and system based on multi-view features

Info

Publication number: CN115496980A
Application number: CN202211155410.4A
Authority: CN
Inventors: 武星; 李攀
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2022-12-20

Abstract

The invention discloses a remote sensing image tampering target detection method and system based on multi-view features, which comprises the following steps: the boundary detection module is used for detecting a boundary artifact of the tampered target after disguised shielding processing; the noise detection module is used for capturing the difference characteristic of the noise distribution of the tampered area and the real area of the remote sensing image; the double attention feature fusion module fuses the features of the channels by using a channel attention fusion network; and the multi-scale loss supervision comprises a loss joint training model of three scales, and the generalization performance of the model is improved. According to the remote sensing image tampered target detection method and system based on the multi-view characteristics, an automatic remote sensing image tampered target abnormity detection model is established, and an image area subjected to tampering processing can be accurately detected and positioned.

Description

Remote sensing image tampered target detection method and system based on multi-view features

Technical Field

The invention relates to a remote sensing image tampered target detection method and system based on multi-view features, and belongs to the field of computers and remote sensing.

Background

The remote sensing image tampered target detection technology is an important intelligent technology for rapidly and accurately identifying whether a tampered target exists in a remote sensing image and positioning an area where the tampered target is located. The tampered target in the remote sensing image is usually subjected to certain camouflage masking processing, such as splicing, copying, deleting and the like, so that the visual identification degree between the sensitive target in the remote sensing image and the nearby background is reduced and eliminated, and the falsification is hidden.

Aiming at the detection problem of a tampered target in a remote sensing image, the traditional machine learning method is strong in pertinence and flexible in design, but the model is solidified and poor in robustness. In recent years, the deep learning technique has attracted much attention in the field of remote sensing image processing, and a large number of target detection algorithms based on a convolutional neural network in deep learning have been proposed and applied to target detection of remote sensing images. Compared with the traditional algorithm, the convolutional neural network can learn higher semantic information from the remote sensing image, so that the convolutional neural network has higher robustness and can better complete the task of detecting the tampered target of the remote sensing image.

In recent years, many works explore a task of detecting a remote sensing image tampered target, and put forward many new ideas based on a deep learning target detection network for improvement, but the following problems still exist in the existing methods: (1) When a traditional semantic segmentation method identifies a tampering target in a remote sensing image, the model is easy to capture too many useless semantic features; (2) The boundary difference between a tampered target area and a real area is not effectively utilized, and the boundary artifact characteristics around the hidden target area are captured; (3) Only the visual RGB domain characteristics of the remote sensing image area are concerned, and the characteristic fusion and characteristic difference analysis aiming at the high-frequency noise characteristics are lacked.

Disclosure of Invention

The purpose of the invention is: the visual feature expression of the remote sensing image is improved, the tampered boundary artifact can be found, the noise feature of the tampered region is captured, and the performance of the tamper detection model is improved by means of multi-scale supervision loss.

In order to achieve the above object, one technical solution of the present invention is to provide a remote sensing image tampered object detection system based on multi-view features, which is characterized by comprising a boundary detection module, a noise detection module, a dual attention feature fusion module, a result output module, and a multi-scale supervision loss module, wherein:

the boundary detection module combines block features of different layers of a residual error network in a progressive mode, so that a boundary artifact generated by disguised shielding processing of a hidden target is obtained based on remote sensing image detection uploaded by a client, and a boundary artifact feature map is obtained;

the noise detection module carries out noise extraction on the remote sensing image uploaded by the client to obtain a noise image, and then captures the difference characteristic of noise distribution of a hidden target area and the rest normal areas as a universal non-semantic characteristic, so that a noise distribution characteristic diagram is obtained;

the dual attention feature fusion module is used for fusing a boundary artifact feature map obtained by the boundary detection module and a noise distribution feature map obtained by the noise detection module;

a result output module: identifying and outputting the tampered region coordinates and the tampered mode of the remote sensing image according to the output result of the dual attention feature fusion module;

and the multi-scale supervision loss module is used for jointly training a detection model consisting of a boundary detection module, a noise detection module, a double attention module and a result output module by using loss functions of three scales.

Preferably, the loss functions of the three scales include a pixel-level loss function, an image-level loss function and a boundary loss function, wherein the pixel-level loss function is used for improving the sensitivity of the model to the pixel-level operation detection, the image-level loss function is used for improving the specificity of the model to the image-level operation detection, and the boundary loss function is used for learning the non-semantic features, so that not only can the hidden target be accurately positioned, but also the strong generalization capability of the model is ensured.

Another technical solution of the present invention is to provide a method for detecting a remote sensing image tampered object based on multi-view characteristics, which is implemented based on the remote sensing image tampered object detection system, and is characterized by comprising the following steps:

step 1, constructing a detection model based on a boundary detection module, a noise detection module, a dual attention feature fusion module and a result output module;

step 2, training the detection model, comprising the following steps:

step 201, collecting sample data and constructing a training data set;

202, a multi-scale supervision loss module jointly trains a detection model by using loss functions of three scales;

the implementation of the detection model comprises the following steps:

2021, combining the features of the residual blocks of different levels in a progressive manner by using a boundary detection module based on the remote sensing image, and detecting a boundary artifact of the hidden target which is subjected to camouflage masking processing to obtain K boundary artifact feature maps;

and calculating the boundary detection probability of each pixel by passing the output of the boundary detection module through a sigmoid function, wherein the boundary detection probability is shown as the following formula:

G _edge (x _i )＝σ(Sobel–ResNet(x _i ))

in the formula, G _edge (x _i ) Represents the ith pixel x _i The boundary detection probability of (1); sigma represents a sigmoid function; sobel-ResNet (x) _i ) Representing a ResNet residual network with a Sobel layer;

step 2022, extracting noise distribution in the remote sensing image by a noise detection module, capturing a difference characteristic of the noise distribution of the hidden target region and the rest normal regions of the remote sensing image through a residual error network, and taking the difference characteristic as a universal non-semantic characteristic to obtain final K noise distribution characteristic graphs;

step 2023, the dual attention feature fusion module fuses the boundary artifact feature maps of the K channels obtained by the boundary detection module and the noise distribution feature maps of the K channels obtained by the noise detection module to obtain a new fusion feature map f ^fusion ；

Fusing the feature maps f ^fusion After bilinear upsampling processing, each pixel point is calculated through a sigmoid functionThe hidden target detection probability of (2) is shown as follows:

G(x _i )＝σ(bilinear–sampling(f _i ^fusion )

in the formula: g (x) _i ) Represents the ith pixel x _i Hidden target detection probability of (2); f. of _i ^fusion The ith channel, bililinear-sampling (f), representing the fused feature map _i ^fusion ) Represents a pair of f _i ^fusion A bilinear upsampling process is performed.

Step 2024, the result output module is based on the fusion feature map f ^fusion Identifying and outputting the coordinates and tampering modes of tampered areas of the remote sensing images;

the multi-scale supervised loss module uses three scales of loss functions including:

for pixel level loss, the Dice loss is used as the pixel level loss ^pixel ：

In the formula, W, H represents the width and height of the remote sensing image, y _i And representing the value of the ith pixel point of the characteristic diagram y.

For image-level loss, the two-class cross entropy loss is used as the image-level loss ^img ：

loss ^img ＝-(y·logG(x)+(1-y)·log(1-G(x)))

In the formula, G (x) represents the detection probability of the hidden object in the image x, and y represents whether the hidden object exists in the image.

For boundary loss, dice loss is used as boundary loss ^edge To detect the hidden target area boundary of the masquerading process:

of formula (II) to (III)' _i E {0,1} indicates whether the ith pixel belongs to a hidden target area boundary,the output of the boundary detection module is calculated by sigmoid function in step 2021.

And 3, inputting the remote sensing image data uploaded by the client in real time into the trained detection model, and outputting the detection result of the tampered region coordinate and the tampered mode of the remote sensing image by the detection model at the server.

Preferably, in step 201, a remote sensing image large-scale database is constructed through a remote sensing data service provided by a satellite, each sample data of the remote sensing image large-scale database contains a marked tampering region position and a tampering mode, and finally a training set, a test set and a verification set are obtained based on the remote sensing image large-scale database.

Preferably, step 2021 comprises the steps of:

introducing the characteristics related to the boundary artifact in a Sobel layer enhanced model characteristic diagram, and inputting the characteristics of the ith ResNet block into the boundary artifact network of the ith layer after the characteristics of the ith ResNet block pass through the Sobel layer of the ith layer to obtain the characteristics of the ith layer;

the feature of the ith level is fused with the feature of the (i + 1) th level;

passing the feature fused between the ith level and the (i + 1) th level through another boundary artifact network and then transmitting the feature into the (i + 2) th level to be fused with the feature of the (i + 2) th level;

and finally obtaining K boundary artifact characteristic graphs through a boundary detection module after a plurality of layers are passed:

in the formula:

a characteristic map representing the k-th boundary artifact; the Sobel-ResNet indicates a ResNet residual network with a Sobel layer; x represents the original feature map.

Preferably, step 2023 comprises the steps of:

the dual attention feature fusion module utilizes a channel attention fusion network to link the channel features of the boundary artifact feature maps of the K channels output by the boundary detection module and the channel features of the noise distribution feature maps of the K channels output by the noise detection module so as to selectively emphasize the interdependent channel feature maps;

meanwhile, the dual attention feature fusion module performs weighted summation on the boundary artifact feature maps of the K channels output by the boundary detection module and the features of all the positions of the noise distribution feature maps of the K channels output by the noise detection module by using a position attention fusion network, and selectively updates the features of each position;

the dual attention feature fusion module adds the outputs of the channel attention fusion network and the position attention fusion network, and performs 1 × 1 convolution dimensionality reduction to obtain a new fusion feature graph f ^fusion As shown in the following formula:

in the formula, dual-Attention represents a Dual Attention feature fusion mechanism.

Preferably, in the step 3, the client and the server are realized based on a C/S architecture, the server isolates a network through a firewall, image data is transmitted through an encryption communication protocol, illegal client access is prevented, and large-scale remote sensing image data storage and GPU calculation are performed through distributed nodes; and the client is connected with a display and a printer aiming at the result output module, and is connected with a display screen and a printer through a set result output module signal, so that the screen display of the diagnosis report and the document printing are realized.

The remote sensing image tampering detection method is reasonable in structural design, utilizes the boundary detection module to find tampered boundary artifacts, captures noise characteristics of tampered areas, improves remote sensing image characteristic expression, and utilizes multi-scale supervision loss to improve the performance of the tampering detection model, so that a tampering target area in the remote sensing image can be accurately located.

Compared with the prior art, the invention has the following effects:

(1) The invention provides a remote sensing image tampered target detection method and system based on multi-view characteristics, which can accurately detect tampered areas and tampered modes in remote sensing images.

(2) The invention provides a boundary detection module which can sharply find the artifact information existing in the boundary between a tampered region and a real region, avoid the remote sensing image recognition model from capturing redundant semantic information and improve the detection precision.

(3) The invention uses a noise detection module, uses the boundary artifact processed on the remote sensing image by disguise shielding and the noise distribution difference between the hidden target area and the rest normal areas of the remote sensing image as the general non-semantic features, and uses a dual attention module to fuse a plurality of features, thereby enhancing the multi-view visual expression capability of the detection model.

(4) The method combines three loss functions of different levels to train the model, wherein pixel-level loss is used for improving the sensitivity of the model to pixel-level operation detection, image-level loss is used for improving the specificity of the model to image-level operation detection, and boundary loss is used for learning non-semantic features, so that not only can a hidden target be accurately positioned, but also the strong generalization capability of the model is ensured.

(5) According to the method and the system for detecting the remote sensing image tampered target based on the multi-view characteristics, the sample database is used for supervised learning, the remote sensing image is sent to the system, the tampered target and the tampered mode existing in the remote sensing image are finally detected, the manual identification process is greatly simplified, the speed and the accuracy of detection and identification are improved, and the detection cost is reduced.

Drawings

FIG. 1 is an overall frame diagram of the present invention;

FIG. 2 is a schematic diagram of a boundary detection module employed in the present invention;

FIG. 3 is a schematic diagram of a dual attention feature fusion module employed in the present invention;

fig. 4 is a diagram of the overall network architecture of the client/server system implementation of the present invention.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

As shown in fig. 1, the present embodiment provides a remote sensing image tampered object detection system based on multi-view features, which includes a boundary detection module, a noise detection module, a dual attention feature fusion module, a result output module, and a multi-scale supervision loss module, where:

the dual attention feature fusion module is used for fusing a boundary artifact feature map obtained by the boundary detection module and a noise distribution feature map obtained by the noise detection module to enhance the multi-view visual expression capability of the detection model;

the multi-scale supervision loss module is used for jointly training a detection model consisting of a boundary detection module, a noise detection module, a dual attention module and a result output module by using loss functions of three scales, improving the performance of the model and preventing the model from being over-fitted, wherein: the loss functions of the three scales comprise a pixel-level loss function, an image-level loss function and a boundary loss function, the pixel-level loss function is used for improving the sensitivity of the model to pixel-level operation detection, the image-level loss function is used for improving the specificity of the model to the image-level operation detection, and the boundary loss function is used for learning non-semantic features, so that not only can a hidden target be accurately positioned, but also the strong generalization capability of the model is ensured.

The embodiment also discloses a remote sensing image tampered target detection method based on multi-view characteristics, which is realized based on the remote sensing image tampered target detection system, and the method comprises the following steps:

step 1, constructing a detection model based on a boundary detection module, a noise detection module, a double attention feature fusion module and a result output module.

Step 2, training the detection model, comprising the following steps:

step 201, collecting sample data, and constructing a training data set:

a remote sensing image large-scale database is constructed through remote sensing data services provided by high-score satellites, sentinels and other series satellites, each sample data of the remote sensing image large-scale database comprises a marked tampering region position and a tampering mode, and finally a training set, a testing set and a verification set are obtained based on the remote sensing image large-scale database.

And 202, jointly training a detection model by using loss functions of three scales through a multi-scale supervision loss module.

The implementation of the detection model comprises the following steps:

step 2021, considering that the boundary artifact pattern included in the feature map directly using the shallow network is diluted after a plurality of convolution operations of the network, and the feature map directly using the residual network output cannot capture the boundary artifact feature, which affects the detection performance of the model for the area camouflage masking processing. Therefore, the invention utilizes the boundary detection module to combine the characteristics of the residual blocks with different levels in a progressive mode based on the remote sensing image to detect the boundary artifacts of the hidden target generated by the disguised shielding treatment, and further comprises the following steps:

and introducing the characteristics related to the boundary artifact in the Sobel layer enhanced model characteristic diagram, and inputting the characteristics of the ith ResNet block into the boundary artifact network of the ith layer after the characteristics of the ith ResNet block pass through the Sobel layer of the ith layer to obtain the characteristics of the ith layer. The features of the ith level are fused with the features of the (i + 1) th level. In order to prevent the accumulation effect from causing the deep features to be over-supervised or ignored during the boundary detection, the features fused with the (i) th level and the (i + 1) th level are transmitted into the (i + 2) th level after passing through another boundary artifact network, and are fused with the features of the (i + 2) th level. Through the processing, the model generates more concentrated response to the area nearby the area subjected to the camouflage masking processing in the feature map, and finally K boundary artifact feature maps are obtained through the boundary detection module after a plurality of layers are processed:

in the formula:

And calculating the boundary detection probability of each pixel by passing the output of the boundary artifact network of the last level in the boundary detection module through a sigmoid function, as shown in the following formula:

G _edge (x _i )＝σ(Sobel–ResNet(x _i ))

in the formula, G _edge (x _i ) Represents the ith pixel x _i The boundary detection probability of (1); sigma represents a sigmoid function; sobel-ResNet represents the ResNet residual network with the Sobel layer.

Step 2022, the noise detection module extracts noise distribution in the remote sensing image by using a BayarConv network, and then captures a difference feature of the noise distribution between the hidden target region and the rest normal regions of the remote sensing image by using a residual error network (in this embodiment, a ResNet-50 network) as a general non-semantic feature to obtain final k noise distribution feature maps, as shown in the following formula:

in the formula (I), the compound is shown in the specification,

representing the kth noise distribution characteristic diagram; x represents an input remote sensing image; bayarConv () represents a BayarConv network; resNet () represents a ResNet-50 network.

Step 2023, the dual attention feature fusion module fuses the boundary artifact feature maps of the K channels obtained by the boundary detection module and the noise distribution feature maps of the K channels obtained by the noise detection module to obtain a new fusion feature map f ^fusion The method specifically comprises the following steps:

in the formula, dual-Attention represents a Dual feature Attention fusion mechanism.

Fusing the feature maps f ^fusion After bilinear upsampling processing, calculating the hidden target detection probability of each pixel point through a sigmoid function, as shown in the following formula:

G(x _i )＝σ(bilinear–sampling(f _i ^fusion )

Step 2024, the result output module is based on the fusion feature map f ^fusion And identifying and outputting coordinates of the tampered area of the remote sensing image and a tampering mode, wherein the tampering mode comprises splicing, copying, cutting and the like, and the specific position of the tampered area is marked by using a red square frame, so that the next work of analysis, processing and the like is facilitated.

1) For pixel-level loss, considering that a hidden target subjected to disguised shielding processing in a large-format remote sensing image is small, and pixel points contained in a hidden target area are few relative to the whole remote sensing image, the invention uses the Dice loss as pixel-level loss ^pixel To alleviate the problem of extreme imbalance of data:

in the formula, W, H represents the width and height of the remote sensing image, t _i And representing the value of the ith pixel point of the characteristic diagram y.

2) For the image-level loss, whether a hidden target subjected to disguised shielding processing exists or not is detected from a macroscopic angle of the whole remote sensing image in a training stage, so that misjudgment of the whole remote sensing image caused by misclassification of a small number of pixels is avoided. Therefore, the invention uses the two-classification cross entropy loss as the image-level loss for the whole remote sensing image ^img ：

loss ^img ＝-(y·log G(x)+(1-y)·log(1-G(x)))

3) For the boundary loss, since the number of the boundary pixels of the hidden target area is far lower than that of the non-boundary pixels, the invention also uses the Dice loss as the boundary loss ^edge To detect the hidden target area boundary of the masquerading process:

of formula (II) to (III)' _i E {0,1} represents whether the ith pixel belongs to the hidden target area boundary, and the output of the boundary detection module is calculated by a sigmoid function in step 2021.

In the embodiment, the whole system is realized based on a C/S architecture, a server side isolates a network through a firewall, image data are transmitted through an encrypted communication protocol, illegal client access is prevented, large-scale remote sensing image data storage and GPU calculation are carried out through distributed nodes, and the client side and the result output module are connected with a display and a printer. The display screen and the printer are connected through the signal of the setting result output module, screen display and document printing of diagnosis reports are achieved, and subsequent auditing and verification of image evidence obtaining analysts are facilitated.

Claims

1. The utility model provides a remote sensing image target detection system that tampers with based on multi-view feature which characterized in that, includes boundary detection module, noise detection module, dual attention feature fusion module, result output module and multiscale supervise loss module, wherein:

2. The remote sensing image tampered target detection system based on multi-view features as claimed in claim 1, wherein the loss functions of the three scales include a pixel-level loss function, an image-level loss function and a boundary loss function, wherein the pixel-level loss function is used for improving sensitivity of a model to pixel-level operation detection, the image-level loss function is used for improving specificity of the model to image-level operation detection, and the boundary loss function is used for learning non-semantic features, so that not only can a hidden target be accurately located, but also strong generalization capability of the model is guaranteed.

3. The remote sensing image tampered target detection method based on the multi-view characteristic and realized by the remote sensing image tampered target detection system of claim 1 is characterized by comprising the following steps of:

step 2, training the detection model, comprising the following steps:

step 201, collecting sample data and constructing a training data set;

the implementation of the detection model comprises the following steps:

G _edge (x _i )＝σ(Sobel–ResNet(x _i ))

in the formula, G _edge (x _i ) Represents the ith pixel x _i The boundary detection probability of (a); sigma represents a sigmoid function; sobel-ResNet (x) _i ) Representing a ResNet residual network with a Sobel layer;

G(x _i )＝σ(bilinear–sampling(f _i ^fusion )

in the formula: g (x) _i ) Represents the ith pixel x _i Hidden target detection probability of (2); f. of _i ^fusion Showing the fused feature mapi channels, bililinear-sampling (f) _i ^fusion ) Represents a pair of f _i ^fusion Carrying out bilinear upsampling processing;

for pixel level loss, the Dice loss is used as the pixel level loss ^pixel ：

For image level loss, use the two-class cross entropy loss as the image level loss ^img ：

loss ^img ＝-(y·log G(x)+(1-y)·log(1-G(x)))

In the formula, G (x) represents the detection probability of the hidden target in the image x, and y represents whether the hidden target exists in the image;

And 3, inputting the remote sensing image data uploaded by the client in real time into the trained detection model, and outputting a detection result of the tampered region coordinate and the tampered mode of the remote sensing image by the detection model at the server.

4. The method for detecting the tampered target of the remote sensing image based on the multi-view characteristic as claimed in claim 3, wherein in step 201, a remote sensing image large-scale database is constructed through a remote sensing data service provided by a satellite, each sample data of the remote sensing image large-scale database contains a marked tampered area position and a tampered mode, and finally a training set, a testing set and a verification set are obtained based on the remote sensing image large-scale database.

5. The method for detecting the tampered target of the remote sensing image based on the multi-view characteristic as claimed in claim 3, wherein the step 2021 comprises the following steps:

the feature of the ith level is fused with the feature of the (i + 1) th level;

in the formula:

6. The method for detecting the tampered target of the remote sensing image based on the multi-view characteristic as claimed in claim 3, wherein the step 2023 comprises the following steps:

7. The method for detecting the tampered target of the remote sensing image based on the multi-view characteristic as claimed in claim 3, wherein in the step 3, the client and the server are implemented based on a C/S architecture, the server isolates a network through a firewall, image data is transmitted through an encrypted communication protocol, access of an illegal client is prevented, and large-scale remote sensing image data storage and GPU calculation are performed through distributed nodes; and the client is connected with a display and a printer aiming at the result output module, and is connected with a display screen and a printer through a set result output module signal, so that the screen display of the diagnosis report and the document printing are realized.