CN111611861A

CN111611861A - Image change detection method based on multi-scale feature association

Info

Publication number: CN111611861A
Application number: CN202010321835.2A
Authority: CN
Inventors: 颜成钢; 白俊杰; 龚镖; 孙垚棋; 张继勇; 张勇东
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2020-04-22
Filing date: 2020-04-22
Publication date: 2020-09-01
Anticipated expiration: 2040-04-22
Also published as: CN111611861B

Abstract

The invention provides an image change detection method based on multi-scale feature association. Firstly, selecting a data set and preprocessing the data set, then building a change detection network, and simultaneously extracting features of a reference image and a current image; and then carrying out feature association and fusion on the extracted image features, obtaining the output of the network through a softmax function, and finally inputting the images of the test set into a trained change detection network model, wherein the change detection network outputs a confidence coefficient result of the judgment of each pixel class. The method improves the detection speed and enlarges the application scene of the change detection technology, so that the characteristic information contains more detail information and high-level semantic information, and the detection effect and the detection precision of the change area can be effectively improved.

Description

Image change detection method based on multi-scale feature association

Technical Field

The invention relates to the field of deep learning and computer vision, in particular to a method for detecting a change region between images through a neural network model.

Background

Scene understanding in real-world scenes is a very challenging problem, which is far from developing to meet the current application requirements. In the fields of urban intelligent monitoring, remote sensing image analysis and the like, a visual change detection technology is a better scheme for detecting the change of a specific area of an urban area, the change of geographic information in a remote sensing image and the like. Visual changes are a high-level reasoning task aimed at accurately identifying changes between a reference image (history) and a current image scene. In the construction of intelligent cities, the detection of illegal encroachment in public places is an important task, and the encroachment mainly means unreasonable temporary use in public areas. In urban supervision, government personnel usually check and judge manually, which is time-consuming, and in the rapid development of deep learning, computer vision technology based on deep learning can be used for judging, and public areas invaded in cities belong to changed areas compared with the situation that the public areas are not invaded before. We can detect this by a change detection method. In video analysis, change detection is also often used as a high level of scene understanding, often by comparing changes in the video background over any two consecutive frames or short periods of time. In the research of remote sensing images, changes in the remote sensing images generally refer to changes in earth surface components, and are very useful in the analysis of land utilization and the exploration of land resources. The remote sensing image is detected by a change detection method, so that the rapid analysis of the land change condition is facilitated.

Some key challenges to the detection of visual changes between two images mainly include illumination, contrast, noise, shooting angle changes, occlusion, and other factors. In the traditional image processing method, interference of factors such as illumination, contrast and noise is better processed, but a good solution is not provided for shooting angle change, shading and the like. Under the rapid development of deep learning, the detection of the changing content between images by a network model becomes a feasible method through a large amount of data driving.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides an image change detection method based on multi-scale feature association. The change detection of images of the same scene taken from different perspectives, lighting and seasons requires the detection of the changed area of the images compared to the reference image and the type of objects (such as vehicles and trash cans) in the area.

According to the method, a double-input feature extraction network is built, feature information extracted from two pictures is correlated by using a multi-scale method, and the area and the type of the change of the pictures are predicted. The method comprises the steps that feature information of two compared images is simultaneously extracted by an input convolution network, then feature graphs of different scales are selected for up-sampling, the feature graphs of the two images are correlated through a correlation algorithm to obtain correlation data, and a softmax classifier is used for analyzing features to obtain a change area and a change category.

The method comprises the following specific steps:

selecting a data set and preprocessing the data set;

step (2), constructing a change detection network;

the feature extraction layer of the change detection network adopts a ResNet50 network and a feature pyramid design mode.

Step (3), feature extraction is carried out on the reference image and the current image simultaneously;

step (4), performing feature association and fusion on the extracted image features;

step (5), using a loss function;

the change detection network is trained in an end-to-end mode, and the loss function of the whole network is as follows:

wherein y is_iIn order to change the output of the detection network,

indicates the label value (Ground Truth) in the dataset;

step (6) of outputting the result

And inputting the test set image into a trained change detection network model, and outputting a confidence result of judgment of each pixel class by the change detection network.

Selecting a data set and preprocessing the data set;

(1.1) preparing training data sets and testing data sets, including VL-CMU-CD, tsunamii and GSV data sets;

(1.2) reading according to the format of the data set to obtain original image data, wherein the original image data comprises;

(1.3) carrying out image pixel normalization processing on the original image data;

step (2), constructing a change detection network;

And (3) simultaneously extracting the features of the reference image and the current image, wherein the specific steps are as follows:

(3.1) firstly, inputting a reference image and a current image into an image1 and an image2 of a change detection network respectively, and performing preliminary feature extraction on the input reference image and the input current image respectively through a first-layer residual block cp1, a second-layer residual block cp2 and a third-layer residual block cp3 of the change detection network to obtain a preliminarily extracted reference image feature map and a preliminarily extracted current image feature map. The same network structure is adopted in the extraction of the image1 and the image2, and the weight is shared between the networks;

(3.2) carrying out feature extraction on the preliminarily extracted reference image feature map and the current image feature map through the same residual block, and then obtaining feature information of different scales by adopting a feature pyramid mode;

(a) and downsampling the reference image feature map through a fourth-layer residual block of ResNet50 to obtain a feature map (r), and downsampling the current image feature map through a fourth-layer residual block of ResNet50 to obtain a feature map (r).

(b) Downsampling feature map ① with the fifth layer residual block of ResNet50 to obtain feature map ②, and downsampling feature map ⑩ with the fifth layer residual block of ResNet50 to obtain feature map

(c) Up-sampling the characteristic map ② by deconvolution network, fusing the up-sampled characteristic map with the characteristic map ① to obtain a characteristic map ③, and performing deconvolution network on the characteristic map

Performing upsampling, and fusing the feature map obtained by the upsampling with the feature map ⑩ to obtain a feature map

(d) Up-sampling the characteristic map ③ by deconvolution network, fusing the up-sampled characteristic map with the reference characteristic map to obtain characteristic map ④, and performing deconvolution network on the characteristic map

Performing up-sampling, and fusing the feature map obtained by up-sampling with the current feature map to obtain the feature map

And (3.3) simultaneously extracting the features obtained by the three layers of networks in front of the change detection network, namely the preliminarily extracted reference image feature map and the current image feature map, by using a SUBCDnet module to perform feature extraction and feature fusion, and finally extracting the feature map associated with the image.

A reference image feature map cp3_1 extracted from image1, and a current image feature map cp3_2 extracted from image 2. The reference image feature map is obtained by using the fourth layer of residual block of resnet50 to obtain a feature map with a size of 14 × 1024, then obtained by using 1 × 1 full convolution layer FC to obtain a feature map with a size of 14 × 11, and obtained by using a deconvolution network to obtain a feature map with a size of 224 × 11, which is denoted as f 1. Calculation was continued through the fifth layer residual block of resnet50 on the basis of the fourth layer residual block to obtain a 7 × 2048 signature, and then through the full convolution layer and the deconvolution layer, a 224 × 11 signature was obtained, which is denoted as f 1. As with the processing of the reference image feature maps, the current image feature maps are subjected to convolution and deconvolution operations to generate feature maps f3 and f4, respectively, of size 224 × 11. When the feature fusion is carried out, feature graphs obtained from the same level are selected for association, namely, the feature graphs f1 and f4 are selected for fusion, f2 and f3 are fused, the fused feature graphs are subjected to SUM fusion after a layer of full convolution operation (FC), and finally, feature graphs associated with images with the size of 224 x 11 are obtained.

And (4) carrying out feature association and fusion on the extracted image features, wherein the specific steps are as follows:

(4.1) performing feature association on the extracted features of the two images with different scales, selecting the features of corresponding dimensions in the two images, and associating the features in the two images in a convolution association correlation mode, wherein a specific mathematical formula of the convolution association correlation is as follows:

a) wherein x₁，x₂Center coordinates of the indicated area, f₁And f₂Representing the characteristic regions involved in the operation.

b) Map feature ② and map feature

Performing feature association in a correlation mode to obtain a feature map ⑥;

c) map feature ③ and map feature

Performing feature association in a correlation mode to obtain a feature map ⑦;

d) map feature ④ and map feature

Performing feature association in a correlation mode to obtain a feature map ⑧;

and (4.2) carrying out feature fusion on the feature graphs obtained by association, wherein for the feature graphs with smaller feature sizes, the features are deconvoluted before the feature fusion, so that the dimensions of all the feature graphs participating in the fusion are the same. Deconvoluting the characteristic diagrams (c), (c) and (b) to make the characteristic dimension the same as the characteristic diagram (c), fusing the characteristic diagram obtained by deconvolution with the characteristic diagram (c) to obtain a characteristic diagram (nini), and finally obtaining the output of the network, namely change map, by means of the softmax function.

Step (5), use of loss function

The network is trained in an end-to-end mode, and the loss function of the whole network is as follows:

wherein y is_iIs the output of the network and is,

indicating the marker value in the data set (Ground Truth)

Step (6) of outputting the result

And inputting the test set image into a trained change detection network model, and outputting a confidence result of judgment of each pixel class by the change detection network. The size of the output end dimension of the change detection network is 224 × 11, wherein 224 × 224 represents the pixel size of the image, 11 represents the prediction of 11 types of the pixels, and 11 change types are shared in the data set, so that the prediction of the type of each pixel belongs to each pixel.

The invention has the beneficial effects that: the invention realizes end-to-end change detection by a neural network learning method, and simultaneously realizes the classification of change areas, thereby improving the detection speed and expanding the application scene of the change detection technology; when the network is designed, a characteristic pyramid mode is adopted, and the characteristic information of multiple layers can be fused, so that the characteristic information contains more detailed information and high-level semantic information. During network design, a characteristic association mode is designed, and the detection effect and the detection precision of a change area can be effectively improved.

Drawings

FIG. 1 is a schematic diagram of a change detection network structure associated with multi-scale features;

FIG. 2 is a schematic diagram of the SUBCDnet module structure.

Detailed Description

The objects and effects of the present invention will become more apparent from the following detailed description of the present invention with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of a change detection network structure associated with multi-scale features, wherein a ResNet50 network is adopted as a feature extraction layer of the network, and a feature pyramid design mode is adopted, in the diagram, cp1, cp2 and cp3 represent the first three residual blocks in the ResNet50 network structure, each residual block consists of a convolution layer and a pooling layer, the size of two input images in the diagram is 224 × 3, and features are extracted through three partial structures of cp1, cp2 and cp3, and ① -cp 3 in the diagram

Feature maps representing the output of each layer of the network are obtained, in a network structure for extracting features of image1, a feature map ① is obtained by calculating a feature map obtained from cp3 through a fourth residual block of resnet50, the dimension of the feature map is 14 x 1024, a feature map ② is obtained by calculating ① through a fifth residual block of resnet50, the dimension of the feature map is 7 x 2048, a feature map obtained by deconvoluting a feature map ② and a feature map ① are subjected to feature fusion to obtain a feature map ③, the feature map is 14 x 1024, a feature map obtained by deconvoluting a feature map ③ and a feature map ② are subjected to feature fusion to obtain a feature map ④, the feature map is 28 x 512, in the network structure for extracting features of image2,the structure of the feature extraction method is the same as that of a network for extracting features of image1, so that the dimensions of the obtained feature graph ⑩ are the same as those of the feature graph ①, and the size of the feature graph is 14X 1024

The feature map ② has dimension of 7 × 2048

The feature map ③ has dimensions of 14 × 1024

The SUBCDnet module network fusion image1 extracts the network cp3 characteristics and the image2 extracts the network cp3 characteristics to obtain a characteristic map ⑤ with the dimension of 224 × 11 in the characteristic map ② and the characteristic map 3 in the association structure

Performing correlation operation to obtain feature map ⑥, feature map ③ and

perform correlation operations to feature map ⑦, feature map ④, and

and performing association operation to obtain a feature map ⑧, performing deconvolution operation on feature maps ⑥⑦ and ⑧ before fusion in order to fuse the associated feature maps with multiple scales so that feature dimensions are the same as those of the feature map ⑤, then performing fusion on the feature map obtained after the deconvolution operation and the feature map ⑤ to obtain a feature map ⑨ with the structure size of 224 × 11, and finally obtaining a prediction result of an output network, namely change map, through a softmax function.

Fig. 2 is a schematic structural diagram of a SUBCDnet module, which is used for hierarchically fusing the feature maps of two preliminarily extracted images.

Cp3_1 and cp3_2 in fig. 2 are feature diagrams extracted from the cp3 structure in fig. one (cp3_1 and cp3_2 have the same structure, and for convenience of description, the cp3 structures in two network molecules are respectively referred to as cp3_1 and cp _2), cp3_1 is a feature diagram extracted from image1, and cp3_2 is a feature diagram extracted from image 2. The cp3_1 signature was obtained from the fourth layer of residual block (cp4_1) of resnet50, with a signature size of 14 × 1024, then from the full convolution layer (FC) of 1 × 1, a signature size of 14 × 11 was obtained, and from the deconvolution network, a signature size of 224 × 11 was obtained, denoted as f 1. Calculation was continued through the fifth layer residual block of resnet50 (cp5_1) based on cp4_1 to obtain a 7 × 2048 signature, and then through the full convolution layer and the deconvolution layer to obtain a 224 × 11 signature, denoted as f 2. Similar to the processing of the feature map cp3_1, the feature maps f3 and f4 with sizes of 224 × 11 are obtained by the cp3_2 through convolution and deconvolution operations, respectively. When feature fusion is carried out, feature graphs obtained from the same level are selected for association, namely, the feature graphs f1 and f4 are selected for fusion, f2 and f3 are selected for fusion, the fused feature graphs are subjected to SUM fusion after a layer of full convolution operation (FC), and finally feature graphs with the size of 224 × 11 are obtained.

The invention provides a change detection method based on multi-scale feature association, which specifically comprises the following steps:

selecting a data set and preprocessing the data set;

step (2), constructing a change detection network;

the feature extraction layer of the change detection network adopts a ResNet50 network and a feature pyramid design mode. The ResNet50 network includes 5 different volume blocks.

step (5), using a loss function;

wherein y is_iIn order to change the output of the detection network,

indicates the label value (Ground Truth) in the dataset;

step (6) of outputting the result

Selecting a data set and preprocessing the data set;

step (2), constructing a change detection network;

the feature extraction layer of the change detection network adopts a ResNet50 network and a feature pyramid design mode. The ResNet50 network includes 5 different residual blocks.

e) wherein x₁，x₂Center coordinates of the indicated area, f₁And f₂Representing the characteristic regions involved in the operation.

f) Map feature ② and map feature

g) map feature ③ and map feature

h) map feature ④ and map feature

Step (5), use of loss function

wherein y is_iIs the output of the network and is,

indicating the marker value in the data set (Ground Truth)

Step (6) of outputting the result

Claims

1. An image change detection method based on multi-scale feature association is characterized by comprising the following steps:

selecting a data set and preprocessing the data set;

step (2), constructing a change detection network;

a ResNet50 network is adopted in a feature extraction layer of the change detection network, and a feature pyramid design mode is adopted;

step (5), using a loss function;

wherein y is_iIn order to change the output of the detection network,

indicates the label value (Ground Truth) in the dataset;

step (6) of outputting the result

2. The method for detecting image change based on multi-scale feature association according to claim 1, wherein the step (1) selects a data set and preprocesses the data set, and the specific operations are as follows;

step (2), constructing a change detection network;

3. The method for detecting image change based on multi-scale feature association according to claim 2, wherein the step (3) performs feature extraction on the reference image and the current image simultaneously, and comprises the following specific steps:

(3.1) firstly, respectively inputting a reference image and a current image into an image1 and an image2 of a change detection network, and respectively performing preliminary feature extraction on the input reference image and the current image through a first-layer residual block cp1, a second-layer residual block cp2 and a third-layer residual block cp3 of the change detection network to obtain a preliminarily extracted reference image feature map and a preliminarily extracted current image feature map; the same network structure is adopted in the extraction of the image1 and the image2, and the weight is shared between the networks;

(a) down-sampling the reference image feature map through a fourth layer of residual block of ResNet50 to obtain a feature map (r), and down-sampling the current image feature map through a fourth layer of residual block of ResNet50 to obtain a feature map (r);

(c) The feature map ② is upsampled by a deconvolution network, and the upsampled feature map is added to the feature map ①Line fusion to obtain characteristic graph ③, and deconvolution network to obtain characteristic graph

(3.3) simultaneously extracting the features obtained by the three layers of networks in front of the change detection network, namely the preliminarily extracted reference image feature map and the current image feature map, by using a SUBCDnet module to extract the features and fuse the features, and finally extracting a feature map associated with the image;

a reference image feature map cp3_1 extracted from image1, a current image feature map cp3_2 extracted from image 2; obtaining feature maps with the size of 14 × 1024 from the fourth layer of residual block of the resnet50 by referring to the image feature maps, then obtaining feature maps with the size of 14 × 11 from the full convolution layer FC with 1 × 1, and obtaining feature maps with the size of 224 × 11 by deconvolution network, which is marked as f 1; calculating continuously through a fifth layer residual block of resnet50 on the basis of a fourth layer residual block to obtain a characteristic diagram of 7 × 2048, and then obtaining a characteristic diagram with the size of 224 × 11 through a full convolution layer and a deconvolution layer, wherein the characteristic diagram is marked as f 1; as with the processing of the reference image feature map, the current image feature map is subjected to convolution and deconvolution operations to obtain feature maps f3 and f4 of size 224 × 11, respectively; when the feature fusion is carried out, feature graphs obtained from the same level are selected for association, namely, the feature graphs f1 and f4 are selected for fusion, f2 and f3 are fused, the fused feature graphs are subjected to SUM fusion after a layer of full convolution operation (FC), and finally, feature graphs associated with images with the size of 224 x 11 are obtained.

4. The method for detecting image change based on multi-scale feature association according to claim 3, wherein the step (4) performs feature association and fusion on the extracted image features, and comprises the following specific steps:

a) wherein x₁，x₂Center coordinates of the indicated area, f₁And f₂Representing characteristic regions participating in operation;

b) map feature ② and map feature

c) map feature ③ and map feature

d) map feature ④ and map feature

(4.2) carrying out feature fusion on the feature graphs obtained by association, wherein for the feature graphs with smaller feature scale, the features are deconvoluted before the feature fusion, so that the dimensions of all the feature graphs participating in the fusion are the same; deconvoluting the characteristic diagrams (c), (c) and (b) to make the characteristic dimension the same as the characteristic diagram (c), fusing the characteristic diagram obtained by deconvolution with the characteristic diagram (c) to obtain a characteristic diagram (nini), and finally obtaining the output of the network, namely change map, by means of the softmax function.

5. The method according to claim 4, wherein the dimension of the output end of the change detection network in the step (6) is 224 × 11, wherein 224 × 224 represents the pixel size of the image, 11 represents the prediction of 11 categories to which the pixels belong, and the change category in 11 is shared in the data set, so that the prediction of the category to which each pixel belongs is performed.