CN112949388A

CN112949388A - Image processing method and device, electronic equipment and storage medium

Info

Publication number: CN112949388A
Application number: CN202110112824.8A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2021-06-11
Anticipated expiration: 2041-01-27
Also published as: CN112949388B; WO2022160753A1

Abstract

The application provides an image processing method, an image processing device, electronic equipment and a storage medium. The method may include extracting features of the two-stage image of the input target change detection network to obtain a first image feature and a second image feature. And obtaining an image processing result for at least one of the images in the two phases and a change detection result for the target in the image in the two phases, which are output by the target change detection network, based on the first image feature and the second image feature.

Description

Image processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to computer technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

At present, the change detection aiming at the same target has important significance. The target change detection may be based on whether the target at the same position changes in different periods.

For example, in building change detection, it is necessary to monitor some illegal construction disorder by determining whether a change such as a building disappearance, a building extension, a newly built building, or the like occurs in a building included in a remote sensing image using remote sensing images taken at different times for the same position.

It can be seen that a high performance target detection method is required.

Disclosure of Invention

In view of the above, the present application discloses at least an image processing method, which may include:

performing feature extraction on the two-stage image input into the target change detection network to obtain a first image feature and a second image feature;

and obtaining an image processing result for at least one of the images in the two phases and a change detection result for the target in the image in the two phases, which are output by the target change detection network, based on the first image feature and the second image feature.

In some embodiments, the extracting features of the two-phase image input to the change detection network to obtain the first image feature and the second image feature includes:

respectively extracting the features of the two-stage images to obtain a first image feature corresponding to a first image in the two-stage images and a second image feature corresponding to a second image in the two-stage images;

the obtaining, based on the first image feature and the second image feature, an image processing result for at least one of the images in two phases and an object change detection result for the image in two phases output by the object change detection network includes:

performing image processing on the first image characteristic and/or the second image characteristic to obtain an image processing result;

and performing feature fusion on the first image features and the second image features to obtain fusion features, and performing change detection on the target contained in the two-stage images based on the fusion features to obtain the target change detection result.

In some embodiments shown, the performing feature fusion on the first image feature and the second image feature to obtain a fused feature includes:

and performing adaptive feature fusion on the first image features and the second image features to obtain fusion features.

In some embodiments, the adaptively performing feature fusion on the first image feature and the second image feature to obtain a fused feature includes:

predicting a first offset amount corresponding to each feature point in the second image feature and a second offset amount corresponding to each feature point in the first image feature using the first image feature and the second image feature; wherein, the first offset is used for adjusting the receptive field of each feature point in the second image feature; the second offset is used for adjusting the receptive field of each feature point in the first image feature;

adjusting the second image characteristic based on the first offset to obtain an adjusted second image characteristic;

adjusting the first image characteristic based on the second offset to obtain an adjusted first image characteristic;

and performing feature fusion on the adjusted first image features and the adjusted second image features to obtain the fusion features.

In some embodiments, the predicting, by using the first image feature and the second image feature, a first offset corresponding to each feature point in the second image feature and a second offset corresponding to each feature point in the first image feature includes:

superposing the first image characteristic and the second image characteristic to obtain a first superposed characteristic; performing offset prediction on the first superposition characteristic to obtain the first offset; and the number of the first and second groups,

superposing the first image characteristic and the second image characteristic to obtain a second superposed characteristic; the second offset is obtained by convolving the second superposition characteristic.

In some embodiments, the performing feature fusion on the adjusted first image feature and the adjusted second image feature to obtain the fused feature includes:

performing optical flow prediction on the adjusted first image characteristic and the adjusted second image characteristic, and determining a corresponding optical flow field; wherein, the optical flow field represents the position error of the same pixel point in the two-stage images;

the optical flow field is utilized to distort the first image characteristic, and a distorted first image characteristic is obtained;

and performing feature fusion on the distorted first image features and the adjusted second image features to obtain the fusion features.

In some embodiments shown, the performing optical flow prediction on the adjusted first image feature and the adjusted second image feature to determine a corresponding optical flow field includes:

coding the adjusted first image characteristics and the adjusted second image characteristics by utilizing a plurality of coding layers included by the optical flow estimation network to obtain a coding result;

decoding the coding result by utilizing a plurality of decoding layers included by the optical flow estimation network to obtain the optical flow field; in the decoding process, starting from a second layer decoding layer, the input of each decoding layer comprises: the output characteristics output by the decoding layer of the previous layer, the optical flow predicted based on the output characteristics and the characteristics output by the coding layer corresponding to the decoding layer.

performing feature fusion on the first image features and the second image features to obtain initial fusion features;

performing parallax feature extraction on the first image feature and the second image feature to obtain a parallax feature; the parallax features represent the matching degree between the feature points of the first image features and the feature points of the second image features;

and determining weight information based on the parallax features, and performing feature selection on the initial fusion features by using the weight information to obtain the fusion features.

In some embodiments, the determining the weight information based on the disparity feature includes:

and superposing the parallax features and the initial fusion features to obtain superposition features, and determining weight information based on the superposition features.

In some illustrative embodiments, the method further comprises:

and determining the sum of the fusion feature and the parallax feature as a final fusion feature.

In some embodiments shown, the degree of match is characterized by a cost value;

the performing parallax feature extraction on the first image feature and the second image feature to obtain a parallax feature includes:

determining a cost space based on the first image feature and the second image feature; the cost space is used for representing cost values between feature points of the first image feature and feature points of the second image feature;

and performing cost aggregation on the cost space to obtain the parallax feature.

and sequentially determining each feature point in the first image feature as a current feature point, and executing: determining similarity between each feature point included in the second image feature and the current feature point, and forming first weight information corresponding to the current feature point; wherein, the first weight information represents the similarity between each feature point in the second image feature and the current feature point;

performing feature aggregation on the second image features by using the first weight information to obtain aggregated second image features;

and sequentially determining each feature point in the second image feature as a current feature point, and executing: determining similarity between each feature point included in the first image feature and the current feature point, and forming second weight information corresponding to the current feature point; the second weight information represents the similarity between each feature point in the first image feature and the current feature point;

performing feature aggregation on the first image features by using the second weight information to obtain aggregated first image features;

and overlapping the aggregated first image features and the second image features to obtain the fusion features.

In some embodiments, the determining the similarity between each feature point included in the second image feature and the current feature point includes:

determining the similarity between a second neighborhood corresponding to each feature point included in the second image feature and a first neighborhood corresponding to the current feature point; the first neighborhood comprises a neighborhood with a first preset size, which is formed by taking the current feature point as a center and other feature points around the current feature point; the second neighborhood region includes a neighborhood region of a second predetermined size formed by the second image feature point as a center and other feature points around the second image feature point.

and sequentially determining each feature point in the first image feature as a current feature point, and executing: determining each feature point included in a third neighborhood corresponding to the feature point with the same position as the current feature point in the second image feature, respectively determining the similarity between each feature point and the current feature point, and forming third weight information corresponding to the current feature point; wherein, the third weight information represents the similarity between each feature point in the third neighborhood and the current feature point; the third neighborhood comprises a neighborhood with a third preset size, which is formed by taking a feature point in the second image feature, which is the same as the current feature point in position, as a center and other feature points around the feature point;

performing feature aggregation on the second image features by using the third weight information to obtain aggregated second image features;

and sequentially determining each feature point in the second image feature as a current feature point, and executing: determining each feature point included in a fourth neighborhood corresponding to a feature point with the same position as the current feature point in the first image feature, and forming fourth weight information corresponding to the current feature point, wherein the feature points are respectively similar to the current feature point; wherein, the fourth weight information represents the similarity between each feature point in the fourth neighborhood and the current feature point; the fourth neighborhood comprises a neighborhood with a fourth preset size, which is formed by taking a feature point in the first image feature, which is the same as the current feature point in position, as a center and other feature points around the feature point;

performing feature aggregation on the first image features by using the fourth weight information to obtain aggregated first image features;

In some embodiments shown, the image processing comprises at least one of:

classifying the images; semantic segmentation; example segmentation; dividing a panorama; and detecting the target.

In some illustrative embodiments, the method further comprises:

performing data enhancement on at least one image in the two images in at least one mode of:

cutting the image; rotating the image; turning over the image; adjusting brightness of the image; adjusting the contrast of the image; adding Gaussian noise to the image; registration errors are added to the two phase images.

In some illustrative embodiments, the target change detection network includes a first input and a second input;

the obtaining, based on the first image feature and the second image feature, an image processing result for at least one of the images in two phases and a change detection result for the target in the image in two phases output by the target change detection network includes:

obtaining a first image processing result for at least one of the images in two phases and a first change detection result for a target in the image in two phases, which are output by the target change detection network, by using the first image feature as a first input and the second image feature as a second input;

obtaining a second image processing result for at least one of the images in two phases and a second change detection result for the target in the image in two phases, which are output by the target change detection network, by using the first image feature as a second input and the second image feature as a first input;

and performing weighted average on the first image processing result and the second image processing result to obtain a final image processing result for at least one stage image in the two-stage images, and performing weighted average on the first change detection result and the second change detection result to obtain a final change detection result for the target in the two-stage images.

In some embodiments shown, in the case that the target change detection network is a network to be trained, the method further includes:

obtaining an image processing loss according to the image processing result of the at least one stage image and the corresponding image processing tag value, and obtaining a change detection loss according to a change detection result of the target in the two stage images and the corresponding change detection tag value;

and adjusting network parameters of the target change detection network based on the image processing loss and the change detection loss.

In some embodiments, the adjusting network parameters of the target change detection network based on the image processing loss and the change detection loss includes:

detecting loss according to the image processing loss and the change to obtain total loss;

and adjusting the network parameters of the target change detection network based on the total loss.

The present application also proposes an image processing apparatus, which may include:

the characteristic extraction module is used for extracting the characteristics of the two-stage images of the input target change detection network to obtain a first image characteristic and a second image characteristic;

and the image processing module is used for obtaining an image processing result aiming at least one stage of image in the two stages of images and a change detection result aiming at the target in the two stages of images, which are output by the target change detection network, on the basis of the first image characteristic and the second image characteristic.

The present application further provides an electronic device, the above device including: a processor; a memory for storing the processor-executable instructions; wherein, the processor is configured to call the executable instructions stored in the memory to implement the image processing method as shown in any one of the foregoing embodiments.

The present application also proposes a computer-readable storage medium storing a computer program for executing the image processing method as shown in any one of the foregoing embodiments.

In the scheme, the target change detection network performs image processing on the input two-stage images to obtain an image processing result for at least one stage of image in the two-stage images and a change detection result for the target in the two-stage images, so that on one hand, the image processing result and the target change detection result can be output simultaneously, and the network utilization rate is improved;

on the other hand, when the target change detection network is trained, the network parameters of the building change detection network can be adjusted based on the image processing result and the corresponding image processing tag value to obtain the image processing loss and the change detection result and the corresponding change detection tag value to obtain the change detection loss, so that the training of the target change detection result is assisted by the image processing result, the joint training of the image processing task and the building change detection task is realized, a large number of training samples with the change detection tag values are not required to be constructed for training, the problem of imbalance between positive and negative samples is overcome, the overall training efficiency is improved, and the target change detection performance of the network is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

In order to more clearly illustrate one or more embodiments of the present application or technical solutions in the related art, the drawings needed to be used in the description of the embodiments or the related art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in one or more embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive exercise.

FIG. 1 is a schematic view of a building change detection process shown in the present application;

FIG. 2 is a method flow diagram of an image processing method shown in the present application;

FIG. 3 is a schematic diagram of a building change network architecture shown in the present application;

FIG. 4 is a method flow diagram of an image processing method shown in the present application;

FIG. 5a illustrates an adaptive feature fusion method according to the present application;

FIG. 5b illustrates an adaptive feature fusion method according to the present application;

FIG. 6 is a schematic illustration of a warping process shown in the present application;

FIG. 7 is a schematic flow chart of an adaptive feature fusion method according to the present application;

FIG. 8 is a schematic view of a feature aggregation process shown in the present application;

FIG. 9 is a method flow diagram of a building change detection method shown herein;

fig. 10 is a schematic structural diagram of an image processing apparatus shown in the present application;

fig. 11 is a hardware configuration diagram of an electronic device according to the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It should also be understood that the word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination," depending on the context.

In the related art, the building change detection is generally performed using a neural network technique.

Referring to fig. 1, fig. 1 is a schematic view illustrating a building change detection process according to the present application.

As shown in fig. 1, a building change detection network for building change detection may include a feature extraction sub-network and a building change detection sub-network (hereinafter referred to as a detection sub-network). Wherein the output of the feature extraction subnetwork is the input of the detection subnetwork.

The feature extraction sub-network is used for performing feature extraction on the first image and the second image to obtain a first image feature and a second image feature. At least the following two ways may be included herein. Firstly, respectively extracting the characteristics of a first image and a second image through the same characteristic extraction sub-network; second, feature extraction may be performed on the first image and the second image, respectively, by two feature extraction subnetworks that share weights.

The first image and the second image may be remote sensing images including a building, which are taken at different times for the same location.

The feature extraction sub-network may be a network determined based on a convolutional neural network. For example, the feature extraction sub-network may be a network constructed based on Networks such as VGGNet (Visual Geometry Group Networks), ResNet (Residual Networks), inclusion (widened Networks), densnet (Dense connected Networks), and the like. The structure of the feature extraction subnetwork is not particularly limited in this application.

The first image feature and the second image feature are image features corresponding to the first image and the second image, respectively. In some examples, the first image feature and the second image feature may comprise a feature map of multiple channels.

The detector sub-network is configured to obtain a building change detection result based on the first image feature and the second image feature. In some examples, the building change detection may be a pixel-level detection, that is, the change detection result may indicate whether each pixel in the first image and the second image has changed and a confidence level of the change.

In some examples, the detection sub-network may include FCN (full volume Networks), SegNet (Segmentation Networks), and other semantic Segmentation Networks. The semantic segmentation network can perform operations such as convolution, down-sampling and up-sampling on image features, and then map the image features of the input image onto a feature map with the same size, so that a prediction can be generated for each pixel in the feature map, and spatial information of the input image is retained at the same time. And finally, carrying out pixel-by-pixel classification on the characteristic diagram so as to obtain a change detection confidence value of a pixel level. The specific structure of the detector sub-network is not particularly limited in this application.

When building change detection is performed based on a first image and a second image, the first image and the second image may be input into a feature extraction sub-network to obtain a first image feature and a second image feature.

Then, the first image feature and the second image feature are input into a detector sub-network to obtain a detection result.

Currently, a supervised mode is usually adopted for training the building change detection network. That is, a training sample with a change detection tag value needs to be constructed. And then network training is carried out in a supervision mode.

However, in an actual situation, there are usually fewer changed building areas (the changed samples can be regarded as positive samples) and more unchanged building areas (the unchanged samples can be regarded as negative samples), so when training sample construction is performed based on a remote sensing image, the number of the negative samples is far greater than that of the positive samples, which causes serious imbalance in the number of samples, and affects network training and convergence, thereby affecting the change detection performance of the network.

In view of the above, the present application is directed to an image processing method. The method can enable the target change detection network to carry out image processing on the input two-stage images to obtain an image processing result aiming at least one stage of image in the two-stage images and a change detection result aiming at the target in the two-stage images, thereby on one hand, the image processing result and the target change detection result can be simultaneously output, and the network utilization rate is improved;

The image processing method can be applied to a training task of a target change detection network, and can also be applied to a test or application task of the target change detection network.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method of image processing according to the present application. The image processing process and the target detection process mentioned in fig. 2 may refer to an image processing task and a target change detection task of a target involved in a training task, or may refer to an image processing task and a target change detection task of a target involved in a test or application task of a target change detection network.

The method may include:

s202, performing feature extraction on the two-stage image of the input target change detection network to obtain a first image feature and a second image feature;

and S204, obtaining an image processing result aiming at least one stage image in the two-stage images and a change detection result aiming at the target in the two-stage images, which are output by the target change detection network, based on the first image characteristics and the second image characteristics.

The target change detection network can be applied to a network training task, an image processing task and a target change detection task. The target can be any object preset according to business requirements. For example, the object may be a building, vegetation, a vehicle, a person, or the like. The target change detection task may be change detection for buildings, vehicles, people, vegetation, and the like.

In the present application, a description will be given taking an example in which the target change detection task is a building change detection task. It will be appreciated that the change detection task for any object may refer to a building change detection task and is not described in detail in this application.

Referring to fig. 3, fig. 3 is a schematic diagram of a building change network structure according to the present application.

As shown in fig. 3, the building change detection network may include a twin feature extraction sub-network, an image processing sub-network, and a building change detection sub-network; wherein, the twin feature extracts the weight of the parameters shared by the sub-networks; the image processing sub-network and the building change detection sub-network, which are connected in parallel, are connected to the twin feature extraction sub-network, respectively.

The image processing sub-network may perform image processing on the first image feature and/or the second image feature to obtain the image segmentation result.

It will be appreciated that different image processing operations may correspond to different configurations of image processing sub-networks.

For example, when the image processing is semantic Segmentation, the image processing sub-network may be a semantic Segmentation sub-network for detecting a semantic Segmentation result, such as a semantic Segmentation sub-network like FCN (full volume Networks), SegNet (Segmentation Networks), and the like.

For another example, when the image processing is image classification, the image processing sub-network may be a classification sub-network for detecting image classification results, such as a two-classifier or a multi-classifier constructed based on a deep convolutional neural network.

For another example, when the image processing is example segmentation or panorama segmentation, the image processing sub-network may be a segmentation sub-network for detecting an image segmentation result, such as MASK-RCNN (MASK Region Convolutional Neural network).

For another example, when the image processing is target detection, the image processing sub-network may be a target detection sub-network for detecting a Region where a target is located, such as a target detection sub-network constructed based on RCNN (Region Convolutional Neural network), FAST-RCNN (FAST Region Convolutional Neural network) or FAST-RCNN (FASTER Region Convolutional Neural network).

In some examples, the image processing includes, but is not limited to, any of: classifying the images; semantic segmentation; example segmentation; dividing a panorama; and detecting the target.

Any of the image processing methods can perform image processing by using the image features extracted by the feature extraction sub-network, so that the effect of training the feature extraction sub-network to assist in training the building change detection task can be achieved.

In some examples, the image processing may be semantic segmentation to facilitate labeling of training samples. Because pixel-level labeling is usually required in constructing a sample for training a building change detection task, when the image processing is semantic segmentation, labeling of a semantic segmentation label value can be performed on each pixel in constructing the sample for training the building change detection task, so as to construct the sample for training the semantic segmentation task at the same time, thereby simplifying the construction difficulty of the training sample.

In the building change detection task, the image processing method can be applied to a training task of a building change detection network, and can also be applied to a test or application task of the building change detection network, and the same or similar method can be adopted in both tasks to realize the image processing task of the building in the remote sensing image and the change detection task of the building. For the sake of understanding, the present application first takes the training task of the building change detection network as an example, and of course, the image processing and change detection process involved therein may also be applied to the testing or application task of the building change detection network.

The training task is an end-to-end training method for the building change detection network, namely training is carried out on the building change detection network, namely training of each sub-network included in the network is completed. Of course, in some examples, the training may be performed on each sub-network in advance, and the present application is not particularly limited thereto.

The training method can be applied to electronic equipment. The electronic device may execute the training method by installing a software system corresponding to the training method. The electronic device may be a notebook computer, a server, a mobile phone, a PAD terminal, etc., and is not particularly limited in this application. The electronic device may be a client device or a server device, and is not particularly limited herein.

Referring to fig. 4, fig. 4 is a flowchart illustrating a method of image processing according to the present application.

As shown in fig. 4, the above method may include,

s402, performing feature extraction on the two-stage image sample input into the building change detection network to obtain a first image feature and a second image feature.

It will be appreciated that if the building change detection network is the network to be trained, then a preparation operation may be performed prior to network training. The preparation operations include, but are not limited to, constructing a training sample set, determining a network structure, initializing network parameters, determining training times, and other hyper-parameters. The above preparation operation will not be described in detail here.

The two-stage images may be two-stage remote sensing images including buildings, which are taken at different times for the same location. Illustratively, the training sample may be constructed using remote sensing images that contain buildings as well as remote sensing images that do not contain buildings.

In some examples, the same feature extraction sub-network may be used to successively perform feature extraction on the first image and the second image, so as to obtain the first image feature and the second image feature.

In some examples, in order to improve network performance, feature extraction may be performed on the images of the two phases by using a twin feature extraction sub-network as shown in fig. 3, so as to obtain a first image feature corresponding to a first image in the images of the two phases and a second image feature corresponding to a second image in the images of the two phases.

For example, the twin feature extraction sub-network described above may be based on a convolutional network constructed by the VGGNet architecture. The convolution network may be utilized to perform operations such as convolution, pooling, etc. on the two-phase image respectively to obtain the first image feature and the second image feature. The structure of the twin feature extraction subnetwork described above is not particularly limited in this application. It is understood that in some examples, to improve the feature extraction effect, modules such as attention mechanism, pyramid pooling, and the like may be further added to the sub-networks.

The twin feature extraction sub-network is used for extracting the features of the images in the two stages, so that the feature extraction efficiency can be improved, the structure of the building change detection network is simplified, the network operation amount is reduced, and the network performance is improved.

After obtaining the first image feature and the second image feature, the method may further include:

s404, obtaining an image processing result for at least one of the two-stage images and a building change detection result for the two-stage image, which are output by the building change detection network, based on the first image feature and the second image feature.

In some examples, the image processing sub-network shown in fig. 3 may be used to perform image processing on the first image feature and/or the second image feature to obtain the image processing result.

It is understood that the above steps comprise three schemes. The following description will take an example in which the first image feature image processing sub-network performs image processing on the first image feature to obtain an image segmentation result.

For example, when the image processing is semantic segmentation, after performing operations such as several convolutions and downsampling on the first image feature by using a semantic segmentation sub-network (e.g., FCN), the intermediate feature can be restored to a feature map with the same size as the first image by performing upsampling by using a deconvolution layer, so that a prediction can be generated for each pixel in the feature map while preserving spatial information in the first image. And finally, classifying whether the pixels are foreground or not one by one on the feature map obtained after the up-sampling so as to obtain a semantic segmentation result corresponding to the first image.

In the above scheme, if the building change detection network is a network to be trained, a semantic segmentation sub-network may be used to obtain a semantic segmentation result of the first image, and then a semantic segmentation loss may be obtained by using a pre-labeled semantic segmentation label value corresponding to the first image, so as to implement training of an image processing task, and further implement the above joint training.

In some examples, the first image feature and the second image feature may be simultaneously used as an input building change detection sub-network for change detection, so as to obtain a building change detection result.

In some examples, the building change detection subnetwork shown in fig. 3 may be used to perform feature fusion on the first image feature and the second image feature to obtain a fusion feature, and then perform change detection on the building included in the two-stage image based on the fusion feature to obtain the building change detection result.

For example, the detection subnetwork is a semantic segmentation network constructed based on FCNs. During change detection, after several operations such as convolution and down-sampling can be performed on the fused features, an up-sampling can be performed by using a deconvolution layer to restore the intermediate features to a feature map with the same size as the two-stage images, so that a prediction can be generated for each pixel in the feature map, and spatial information in the first image is retained. And finally, classifying whether the change occurs or not pixel by pixel on the feature map obtained after the up-sampling so as to obtain the building change detection result aiming at the two-stage image.

In this example, since feature fusion is performed first, information carried by the first image feature and the second image feature is utilized, and information brought by fusion of the two image features is combined, so that performance of a network for detecting building change is improved.

In this embodiment, if the building change detection network is a trained network, the output image processing result and the building change detection result may be directly used as the final result. If the building change detection network is the network to be trained, after obtaining the image processing result and the building change detection result, the following steps can be executed:

s406, obtaining an image processing loss according to the image processing result and the corresponding image processing label value, and obtaining a change detection loss according to the change detection result and the corresponding change detection label value.

The image processing label value is specifically a true value labeled when constructing an image processing task training sample. It will be appreciated that different image processing operations correspond to different image processing tag values.

For example, when the image processing is semantic segmentation, the image processing tag value may be a semantic segmentation tag value indicating whether each pixel point of the input sample is foreground. For example, a label value of 1 indicates that a pixel is a foreground, and 0 indicates that a pixel is a background.

For another example, when the image processing is image classification, the above-described image processing tag value may be an image classification tag value indicating whether the input sample is a building image. For example, a label value of 1 may indicate that the image is a building image, and 0 indicates that the image is not a building image.

For another example, when the image processing is instance division or panorama division, the above-described image processing tag value may be an instance division or panorama division tag value indicating a building bounding box contained in the input sample and the number of the above-described bounding box. For example, the instance split tag value may include a tag value indicating a type of an object within the bounding box and location information of the bounding box. For another example, when the image processing is target detection, the above-described image processing tag value may be a target detection tag value indicating a building bounding box included in the input sample. For example, the target detection tag value may include building bounding box coordinate information included in the image and a tag value indicating whether an object within the bounding box is a building.

The change detection label value is a true value of the building change detection task training sample. For example, the change detection tag value may be a tag value indicating whether each pixel point of the input sample transmits a change.

In some examples, an image processing loss error between the image processing result and the corresponding image processing tag value may be determined based on a preset first loss function. A change detection loss error between the change detection result and the corresponding change detection tag value may then be determined based on a preset second loss function.

The first loss function and the second loss function may be cross entropy loss functions, exponential loss functions, mean square error loss functions, and the like. The specific types of the first loss function and the second loss function are not limited in this application.

After the image processing loss and the change detection loss are obtained, S408 may be executed to adjust network parameters of the building change detection network based on the image processing loss and the change detection loss.

In some examples, the total loss may be derived from the sum of the above-described segmentation loss and the change detection loss. And then adjusting network parameters of the building change detection network based on the total loss. In some examples, the two losses may be subtracted from each other, multiplied together, and the like to obtain the total loss, which is not described in detail herein.

In some examples, after the total loss is obtained, the magnitude of the current gradient decrease may be determined, for example, using a random gradient decrease method. And then adjusting the network parameters of the building change detection network by utilizing back propagation according to the amplitude. It is understood that other ways of adjusting the network parameters may be used in practical applications, and are not described in detail herein.

The operations of S402-S408 may then be repeated using the constructed training sample set until the building change detection network converges to complete the joint training. It should be noted that the above convergence condition may be, for example, that a preset training time is reached, or a variation of the obtained joint learning loss function after M (M is a positive integer greater than 1) consecutive forward propagations is smaller than a certain threshold. The present application does not specifically limit the conditions for model convergence.

In the above arrangement, the method may cause the building change detection network to perform three tasks. The first aspect is a feature extraction task, namely, feature extraction is carried out on an input two-stage image to obtain extracted features; the second aspect is an image processing task, namely, an image processing result for at least one stage of image in the two stages of images is obtained by using the extracted features output by the feature extraction task; the third aspect is a building change detection task for obtaining a building change detection result for the two-stage image by using the extracted features output by the feature extraction task.

Therefore, during training, based on the image processing result and the corresponding image processing label value, the image processing loss and the change detection result and the corresponding change detection label value are obtained, the network parameters of the building change detection network are adjusted, training of the characteristic extraction task can be assisted through training of the image processing task, and then joint training of the image processing task and the building change detection task is achieved, a large number of training samples with the change detection label values are not needed, the problem of unbalance of positive and negative samples is favorably solved, the overall training efficiency is improved, and the building change detection performance of the network is improved.

In some examples, at least one of the following data enhancements may be employed in constructing the training sample set: cutting the sample image; rotating the sample image; turning over the sample image; adjusting brightness of the sample image; adjusting the contrast of the sample image; adding Gaussian noise to the sample image; registration errors are added to the two-phase image samples.

Relatively abundant training samples can be obtained through data increase, the overfitting problem usually encountered in the training process of a deep learning model is solved to a certain extent, and the robustness of the model is enhanced.

In some examples, to further improve network robustness. After the target change detection is performed to obtain an image processing result (hereinafter referred to as a first image processing result) and a change detection result (hereinafter referred to as a first change detection result), the input positions of the first image feature and the second image feature may be exchanged, and the image processing and the change detection may be performed again to obtain a second image processing result and a second change detection result. Then, the first image processing result and the second image processing result are weighted and averaged to obtain a final image processing result, and the first change detection result and the second change detection result are weighted and averaged to obtain a final change detection result. Here, the weight is not particularly limited in the present application. For example, the weight may be 0.5.

In the above example, the input positions of the first image feature and the second image feature may be exchanged and then change detection may be performed again, and the final image processing result is a weighted average result of the first image processing result and the second image processing result, and the final change detection result is a weighted average result of the first change detection result and the second change detection result, so that an error caused by an input order of the first image and the second image may be avoided, and thus building change detection accuracy and robustness of the network may be improved.

With continued reference to FIG. 3, a fusion unit and a detection unit may be included in the building change detection subnetwork.

The fusion unit is used for fusing the first image characteristic and the second image characteristic to obtain a fusion characteristic. The detection unit may be a semantic segmentation network, and may perform pixel-level semantic segmentation based on the fusion features to determine whether a building in the input two-stage image has changed.

In some examples, the above-described fusion may be performed in the above-described fusion unit using operations such as superposition, addition, multiplication, and the like.

In practical application, although the input two-stage remote sensing image is shot for the same place, on one hand, due to the influence of the shooting angle of the remote sensing image, the roof and the base of a building appearing in the remote sensing image often deviate (hereinafter referred to as deviation), so that the building is deformed; on the other hand, registration deviation often exists due to image registration for the two-stage remote sensing images, namely, the same building can appear at different positions in the two-stage images. The two reasons mentioned above will probably affect the result of the building change detection, and cause a false alarm.

In some examples, in order to reduce the false alarm rate and improve the building change detection accuracy, adaptive feature fusion may be performed on the first image feature and the second image feature to obtain a fusion feature.

Due to the adoption of the self-adaptive feature fusion, the self-adaptive feature fusion can be carried out according to the respective image features of the two input images, and compared with fusion modes such as superposition, addition, multiplication and the like, the error caused by the deviation and the registration deviation can be corrected in the fusion process, so that the building change detection accuracy is improved, and the false alarm rate is reduced.

In some examples, the field of view of each feature point in the first image feature and the second image feature may be expanded based on the deformable convolution, and then the first image feature and the second image feature may be adjusted based on the expanded field of view, so as to obtain the first image feature and the second image feature that eliminate the error caused by the offset.

The receptive field may be a size of an area where pixels on a feature map output by each layer of the convolutional neural network are mapped on the input image. For example, taking the convolution kernel as 3 × 3 as an example, in the conventional convolution, the receptive field of a certain pixel point on the feature map on the input image is only 9 pixel points adjacent to each other up, down, left and right, and the receptive field is small. In the application, the receptive field of the pixel point on the input image can be enlarged through the variability convolution.

Specifically, S502 may be executed first, and a first offset amount corresponding to each feature point in the second image feature and a second offset amount corresponding to each feature point in the first image feature may be predicted by using the first image feature and the second image feature; wherein, the first offset is used for adjusting the receptive field of each feature point in the second image feature; the second offset amount is used to adjust the reception field of each feature point in the first image feature.

Then, the second image feature can be adjusted based on the first offset to obtain an adjusted second image feature; and adjusting the first image feature by a deformable convolution based on the second offset amount to obtain an adjusted first image feature.

Finally, feature fusion can be performed on the adjusted first image features and the adjusted second image features to obtain the fusion features.

The first offset amount and the second offset amount are position offset amounts corresponding to a feature point in an image feature. For example, the above-described offset amount may be an amount of displacement of the characteristic point in the X-axis and Y-axis directions. The offset is used to adjust the reception field of each feature point in the second image feature. For example, in a conventional convolution operation with a convolution kernel of 3 × 3, it is common to convolve the convolution kernel with a region formed by 3 × 3 feature points around a feature point, so that the receptive field for the feature point is limited to only the region. And through the offset, when performing deformable convolution, the convolution kernel and the region formed by the characteristic point determined after the offset is performed according to the offset can be convolved, so that the receptive field of the characteristic point can be adjusted, the determined receptive field is larger than that of the conventional convolution, and the error caused by the offset of the roof and the base is eliminated.

For example, in a remote sensing image containing a building, the roof and the basement of the building are typically offset (i.e., the basement and the roof are in different positions in the remote sensing image). By determining the offset and performing the deformable convolution based on the offset, the receptive field of the pixel points in the image can be enlarged, so that the base and roof characteristics of the unified building can be included in the receptive field of the pixel points at the same time, and therefore, the change detection error caused by the fact that the roof and the base characteristics can not be sensed in the same receptive field due to the offset can be eliminated.

Referring to fig. 5a and 5b, fig. 5a and 5b illustrate an adaptive feature fusion method according to the present application. Schematically, in fig. 5a and 5b, XA represents the first image feature, XB represents the second image feature, XA _ d represents the first image feature adjusted by the deformable convolution operation, and XB _ d represents the second image feature adjusted by the deformable convolution operation.

In some examples, to improve the feature fusion effect, when determining the first offset amount and the second offset amount, the positions of the first image feature and the second image feature may be exchanged to eliminate the influence due to the input position of the input image.

As shown in fig. 5a, the first image feature and the second image feature may be superimposed to obtain a first superimposed feature. In some examples, the first image feature precedes the second image feature. The first image feature may be located before the second image feature, where a feature layer included in the first image feature is located before a feature layer included in the second image feature. And then performing offset prediction on the first superposition characteristic to obtain the first offset.

As shown in fig. 5b, the first image feature and the second image feature may be superimposed to obtain a second superimposed feature. In some examples, the second image feature precedes the first image feature. The second image feature may be located before the first image feature, where a feature layer included in the second image feature is located before a feature layer included in the first image feature. And then convolving the second superposition characteristic to obtain the second offset.

The second image feature may then be subjected to a variable convolution using the first offset and the first image feature using the second offset to obtain an adjusted first image feature and a second image feature. And finally, performing feature fusion on the adjusted first image features and the adjusted second image features to obtain the fusion features.

In this example, when the first offset and the second offset are determined, the positions of the first image feature and the second image feature can be exchanged, so that the influence caused by the input position of the input image is eliminated, and the building change detection accuracy and robustness are improved.

In some examples, to further improve building change detection accuracy, the false alarm rate is reduced. The image characteristics corresponding to any image can be distorted (distortion comprises translation and rotation) by using the optical flow field between the input two-stage images, so that the same building is positioned at the same position of the two-stage images, and errors caused by configuration deviation are eliminated.

It will be appreciated that in the above warping process, three warping schemes may be included, namely warping for the first image feature and/or the second image feature. The following description will take an example of warping the first image feature.

Referring to fig. 6, fig. 6 is a schematic diagram illustrating a twisting process according to the present application. Schematically, in fig. 6, a first image feature adjusted by a deformable convolution operation is denoted by XA _ d, a second image feature adjusted by a deformable convolution operation is denoted by XB _ d, and a fused feature is identified by XAB.

As shown in fig. 6, S602 may be executed first, and optical flow prediction is performed on the adjusted first image feature and the adjusted second image feature by using an optical flow estimation network (optical flow estimation unit in fig. 6), so as to determine a corresponding optical flow field; and the optical flow field represents the position error of the same pixel point in the two-stage images.

The optical Flow estimation network may be a neural network constructed based on a structure such as a flowetsimple network (flowessimple) or a FlowNetCorr (Flow Networks Correction optical Flow network).

In some examples, a FlowNetCorr structure may be employed. The optical flow estimation network can adopt a common optical flow estimation loss function. For example, the loss function may be an optical flow network auto-supervision loss function

Wherein the content of the first and second substances,

the image Is distorted by the estimated optical flow to obtain a predicted image It ', and the SSIM can represent the structural similarity between the predicted image It ' and the real image It '.

As shown in fig. 6, the optical flow estimation unit may include an encoding unit and a decoding unit.

The coding unit may include a plurality of coding layers constructed by convolutional layers and pooling layers. In the encoding unit, the adjusted first image feature and the adjusted second image feature may be encoded to obtain an encoding result.

The decoding unit may include a plurality of decoding layers corresponding to the encoding layers one to one. In the decoding unit, the encoding result may be decoded to obtain the optical flow field; in the decoding process, starting from a second layer decoding layer, the input of each decoding layer comprises: the output characteristics output by the decoding layer of the previous layer, the optical flow predicted based on the output characteristics and the characteristics output by the coding layer corresponding to the decoding layer.

By means of the method for predicting the optical flow field, high-order information transmitted by a rough characteristic diagram and fine local information provided by a low-layer characteristic diagram can be reserved, so that the predicted optical flow field is more accurate, the characteristic fusion effect is improved, and the change detection accuracy is improved.

For example, when the optical flow field between the first image feature and the second image feature is predicted, the optical flow field is estimated by adopting the optical flow estimation network with the FlowNetCorr structure, so that the overall high-order information of the first image feature and the second image feature can be retained, and the fine local information obtained after down-sampling the image features can be retained, so that the predicted optical flow field is more accurate, and the change detection accuracy is further improved.

After obtaining the optical flow field, S604 may be executed, and in the warping unit, the first image feature is warped by using the optical flow field, so as to obtain a warped first image feature.

The twisting operation may include translation and selection.

Through the warping operation, feature alignment of the first image feature and the second image feature can be achieved, and therefore change detection errors caused by registration deviation of the first image and the second image are eliminated.

Finally, S606 may be executed, and in the superimposing unit, feature fusion is performed on the warped first image feature and the adjusted second image feature, so as to obtain the fused feature.

The obtained fusion characteristics can eliminate the change detection error of the first image and the second image caused by registration deviation, thereby improving the change detection accuracy of the network and reducing the false alarm rate.

In some examples, adaptive feature fusion may be implemented based on a disparity estimation network.

Referring to fig. 7, fig. 7 is a schematic flow chart of an adaptive feature fusion method according to the present application. Schematically, in fig. 7, XA is used to represent the first image feature, XB is used to represent the second image feature, Xd is used to represent the initial fusion feature obtained by performing the preliminary fusion of XA and XB, XAB _ C is used to represent the final fusion feature.

It should be noted that, in fig. 7, a vertical dotted line is used to extract the parallax feature from the adaptive feature fusion process range, and perform two sub-processes of feature fusion by using the parallax feature.

In some examples, referring to the left half of the dotted line in fig. 7, the first image feature XA and the second image feature XB may be feature-fused to obtain an initial fusion feature XAB _ C.

The feature fusion described above may be operations such as superposition, addition, multiplication, and the like. The following takes the superposition as an example. At this time, XAB _ C is a superposition feature.

Then, parallax feature extraction can be carried out on the first image feature and the second image feature to obtain a parallax feature Xd; the parallax feature Xd represents a matching degree between a feature point of the first image feature and a feature point of the second image feature.

The degree of matching can be characterized in different dimensions. For example, the degree of match may be characterized by a cost value in some instances. As another example, the degree of matching may be characterized by similarity.

When the matching degree is represented by a Cost value, determining a Cost space (Cost Volume) based on the first image feature and the second image feature; the cost space is used for representing cost values between the feature points of the first image feature and the feature points of the second image feature. And then carrying out cost aggregation on the cost space to obtain the parallax feature.

In some examples, the first image feature may be horizontally aligned with the second image feature when determining the cost space. And then determining two feature points corresponding to the first image feature and the second image feature from the minimum parallax based on a preset parallax range. The cost between two feature points can then be calculated based on methods such as hamming distance, paradigm distance, gray scale calculations, etc. The set of cost values at the current minimum disparity may then be determined based on the cost values between all corresponding pairs of feature points in the first image feature and the second image feature.

Gradually increasing the parallax, and repeating the steps until the cost value sets under all parallaxes in the parallax range are obtained. The cost space may then be composed based on the set of cost values.

In some examples, in performing cost aggregation on the cost space, the disparity feature may be obtained by performing 3D convolution on the cost space by using a 3D convolution kernel (e.g., a convolution kernel of 3 × 3).

It is understood that the parallax feature may represent a degree of matching between the feature point of the first image feature and the feature point in the second image feature.

In some examples, after obtaining the disparity feature, weight information may be determined based on the disparity feature, and the initial fusion feature may be subjected to feature selection using the weight information to obtain the fusion feature.

In some examples, the weight information may be dot-multiplied with the initial fusion feature to obtain the fusion feature when selecting the feature.

Since the parallax feature can represent the matching degree between the feature points of the first image feature and the feature points of the second image feature, the weight information is determined based on the parallax feature, and the operation of selecting the feature of the initial fusion feature by using the weight information can strengthen the strong correlation feature with higher matching degree in the initial fusion feature and weaken the weak correlation feature with lower matching degree, so that the strong correlation feature with higher matching degree in the input two-stage image can be strengthened in the obtained fusion feature, and the weak correlation feature with lower matching degree can be weakened, that is, when building change detection is performed based on the final fusion feature, the information provided by the matched pixel points in the two-stage image can be strengthened, the information provided by the unmatched pixel points can be weakened, and thus change detection errors caused by the deviation and configuration deviation can be eliminated, and further improve the building change detection accuracy to the image of two stages, reduce the false alarm rate.

In some examples, to further improve the detection accuracy, the disparity feature and the initial fusion feature may be superimposed when determining the weight information, so as to obtain a superimposed feature, and the weight information may be determined based on the superimposed feature.

Referring to the right half of the dotted line in fig. 7, the initial fusion feature XAB _ C and the disparity feature Xd may be superimposed, and then the superimposed feature may be convolved, and the size of the superimposed feature is adjusted to be consistent with the size of XAB _ C, so as to obtain the weight information.

In some examples, the weight information may be normalized for ease of calculation to obtain values in the range of 0-1.

In the above example, information carried by the initial fusion feature itself is introduced when determining the weight information, so that more accurate weight information can be determined, and further the detection accuracy is improved.

In some examples, to further improve the detection accuracy, the sum of the fusion feature and the disparity feature may be determined as a final fusion feature after the fusion feature is obtained.

Referring to the second half of the dotted line in fig. 7, the above weight information may be convolved to obtain the disparity feature Xd, and then Xd is added to the fusion feature to obtain the final fusion feature XAB.

When Xd is added to the fused feature, the pixel values where Xd is in the same position as the fused feature may be added to obtain XAB.

In the above example, because the parallax feature can be added to the final fusion feature, when building change detection is performed, feature point matching degree information carried by the parallax feature can be introduced, so that detection accuracy is further improved.

In some examples, adaptive feature fusion may be performed based on Non-local structures.

In some examples, each feature point in the first image feature may be sequentially determined as a current feature point, and: determining similarity between each feature point included in the second image feature and the current feature point, and forming first weight information corresponding to the current feature point; wherein the first weight information represents a similarity between each feature point in the second image feature and the current feature point. Then, feature aggregation may be performed on the second image features using the first weight information to obtain aggregated second image features.

Referring to fig. 8, fig. 8 is a schematic view illustrating a feature aggregation process according to the present application. Schematically, in fig. 8, XA indicates a first image feature, XB indicates a second image feature, and XBF indicates an aggregated image feature obtained by aggregating features of XB.

As shown in fig. 8, each feature point in XA may be sequentially determined as a current feature point (a first feature point in fig. 8), and then each feature point may be traversed from a feature point at the upper left corner of the second image feature (a second feature point in fig. 8), and a similarity between each feature point and the current feature point may be determined. In the calculation of the similarity, methods such as cosine distance, mahalanobis distance, and the like may be used. The similarity calculation method is not particularly limited in the present application.

Then, based on the similarity between the current feature point and each second feature point, first weight information corresponding to the current feature point may be constructed. The weight information may be a weight matrix or a weight vector in some examples.

Then, feature aggregation may be performed on the second image features by using the first weight information, so as to obtain aggregated second image features. In some examples, each feature point of the second image feature may be sequentially determined as a current feature point, and:

determining weight information corresponding to feature points in the first image feature, which have the same positions as the current feature points, performing weighted summation on the weight information and each feature point in the second image feature, and fusing the calculation result and the pixel value of the current feature point to obtain a fused pixel value. It should be noted that the above-mentioned fusion operation may include addition, multiplication, replacement, and the like. The present application does not specifically limit the operation of fusion. By analogy, feature aggregation XBF for the above-described second image feature may be completed.

In the above example, since the first weight information represents the similarity between each feature point in the second image feature and the current feature point, the feature aggregation is performed on the second image feature by using the first weight information, so that a strong association feature with a high similarity to the first image feature in the second image feature can be highlighted, the feature with a strong association between the second image feature and the first image feature obtained after aggregation is enhanced, the weak association feature is weakened, the change detection error caused by the offset and the configuration deviation is eliminated, the building change detection accuracy for the images in two phases is further improved, and the false alarm rate is reduced.

Similarly, characteristic aggregation of XA can be performed in the manner described above to obtain XAF.

The aggregated first image feature XAF and the second image feature XBF may then be superimposed to obtain the above-described fused feature.

Because the aggregated first image features and the aggregated second image features strengthen the features of strong association between the first image features and the second image features and weaken the weak association features, when the overlapped features of the first image features and the second image features are used for change detection, the information provided by matched pixel points in the two-stage images can be strengthened, and the information provided by unmatched pixel points can be weakened, so that the change detection error caused by the deviation and the configuration deviation can be eliminated, the building change detection accuracy aiming at the two-stage images can be improved, and the false alarm rate can be reduced.

In some examples, in order to improve accuracy and robustness of change detection, when calculating the similarity, a similarity between a second neighborhood corresponding to each feature point included in the second image feature and a first neighborhood corresponding to the current feature point may be determined; the first neighborhood is a neighborhood with a first preset size and formed by taking the current feature point as a center and other feature points around the current feature point; the second neighborhood is a neighborhood of a second preset size, which is formed by taking the second image feature point as a center and other feature points around the second image feature point.

The first predetermined size and the second predetermined size may be empirically set values. The first predetermined size and the second predetermined size may be the same or different. In some examples, the first predetermined size may be 1, and the second predetermined size may be 3 × 3.

Because the neighborhood can comprise a larger receptive field, the similarity calculated based on the neighborhood can be more accurate, so that more accurate weight information is determined, and the accuracy and robustness of change detection are improved.

In some examples, to reduce the amount of model computation, change detection efficiency is improved. When the first image feature and the second image feature are adaptively feature-fused to obtain a fused feature, each feature point in the first image feature may be sequentially determined as a current feature point, and the following steps may be performed: determining each feature point included in a third neighborhood corresponding to the feature point with the same position as the current feature point in the second image feature, respectively determining the similarity between each feature point and the current feature point, and forming third weight information corresponding to the current feature point; wherein, the third weight information represents the similarity between each feature point in the third neighborhood and the current feature point; the third neighborhood includes a neighborhood of a third preset size, which is formed by taking a feature point in the second image feature, which is the same as the current feature point, as a center, and other feature points around the feature point.

Then, feature aggregation may be performed on the second image feature using the third weight information to obtain an aggregated second image feature.

In some examples, each feature point of the second image feature may be sequentially determined as a current feature point, and:

determining weight information corresponding to a feature point in the first image feature, which is the same as the current feature point in position, performing weighted summation with the weight information and the current feature point as the center and each feature point in a third preset-size neighborhood composed of other feature points around the current feature point, and fusing the calculation result with the pixel value of the current feature point to obtain a fused pixel value. It should be noted that the above-mentioned fusion operation may include addition, multiplication, replacement, and the like. The present application does not specifically limit the operation of fusion. By analogy, feature aggregation XBF for the above-described second image feature may be completed.

In the above example, the non-local structure example may be improved, and in the adaptive fusion process, the receptive field is reduced to the points in the neighborhood around the current pixel point, so as to reduce the calculation amount in the adaptive fusion process and improve the change detection efficiency.

A building change detection method is also presented in the present application. The method can be applied to any electronic device. The type of the electronic device may be a notebook computer, a server, a mobile phone, a PAD terminal, etc., and is not particularly limited in this application.

It is understood that the change detection method may be performed by only the client device or the server device, or may be performed by the client device and the server device in cooperation.

For example, the change detection method described above may be integrated into the client device. The client device can provide computing power through a hardware environment of the client device to execute the image processing method.

For another example, the change detection method described above may be integrated into a system platform. The server device carrying the system platform can provide calculation power through the hardware environment of the server device to execute the image processing method.

Also for example, the change detection method described above may be divided into two tasks, acquiring an image and detecting a change. Wherein the retrieval task may be integrated into the client device. The change detection task may be integrated into the server device. The client device may initiate a change detection request to the server device after acquiring the two-stage image. The server device may perform building change detection on the target two-phase image in response to the request after receiving the change detection request.

Referring to fig. 9, fig. 9 is a flowchart illustrating a method for detecting a building change according to the present application.

As shown in fig. 9, the method may include:

s902, a two-stage image containing a building is acquired.

And S904, respectively carrying out feature extraction on the acquired two-stage images through a building change detection network to obtain a first image feature and a second image feature.

S906, based on the first image feature and the second image feature, performs change detection on the building included in the two-stage image, and obtains a result of the change detection on the building.

Illustratively, when the building change detection task further includes an image processing task for the building, the building change detection network may further perform image processing on at least one of the two acquired images to obtain a corresponding image processing result.

In the above-mentioned image processing task and/or building change detection task, the specific implementation method may refer to the methods mentioned in fig. 1 to 8.

The building change detection network is obtained by training by using a training method shown in any one of the examples.

In some examples, in performing S906, a fused feature may be obtained based on feature fusion of the first image feature and the second image feature. Then, based on the fusion features, the change detection of the building included in the two-stage images can be performed, and the change detection result of the building can be obtained.

The following examples are described with reference to building change detection scenarios.

In this scenario, the detection device may periodically acquire remote sensing images taken by the satellites for the same location. The detection device can then perform building change detection on the acquired remote sensing image and output a detection result.

The building change detection network (hereinafter, simply referred to as a detection network) of the building trained by the training method described in any one of the above examples is mounted in the detection device in advance. The network training process is not described in detail herein.

The above detection network structure can be referred to fig. 3. The network may include a twin feature extraction sub-network with parameter sharing and a detection sub-network connected to the twin feature extraction sub-network. The sensor sub-network may include a fusion unit and a detection unit.

The fusion unit may use an adaptive feature fusion method as shown in any of the previous examples. The detection unit may be a semantic segmentation network at the pixel level, and may determine whether a building has changed at the pixel level.

The detection device is assumed to acquire a remote sensing image A and a remote sensing image B.

At this time, the detection device may input the two-stage image into the detection network, and perform feature extraction on a and B by using the trained twin feature extraction sub-network, to obtain a first image feature XA and a second image feature XB.

In the training process, the twin feature extraction sub-network is obtained based on joint training, so that features more beneficial to change detection can be extracted, and the change detection accuracy is improved.

The XA and XB may then be adaptively feature fused in a fusion unit. For example, a trained deformable convolution network may be used to perform a deformable convolution on XA and XB to obtain an adjusted first feature XA _ d and second image feature XB _ d to correct for misalignment between the roof and the basement of the building. The trained FlowNetCorr can then be used to predict the optical flow field between XA _ d and XB _ d, and then the optical flow field is used to distort XA _ d to achieve feature alignment to eliminate the registration deviation caused by image registration of a and B. Finally, the warped first image feature and the XB _ D can be superimposed to obtain a fusion feature XAB to complete adaptive feature fusion.

The self-adaptive feature fusion can eliminate the offset and the image registration deviation, so that the change detection accuracy can be improved, and the false alarm rate is reduced.

It is understood that the fusion unit can also perform adaptive fusion by using disparity estimation, non-local approach and non-local improvement approach to eliminate the offset and image registration deviation, which is not described in detail herein.

Finally, the XAB can be input into the detection unit to complete semantic segmentation at the pixel level, and a detection result is obtained.

In some examples, the detection device may display the detection result on a display page. When the detection result is displayed, the boundary frame corresponding to the building with the changed display can be displayed.

In some examples, if any building is detected to be changed, the detection device may further construct warning information and send the warning information to a monitoring party.

The application also provides an image processing device.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an image processing apparatus according to the present application.

An image processing apparatus 100, said apparatus 100 comprising:

the feature extraction module 101 is configured to perform feature extraction on a two-stage image of an input target change detection network to obtain a first image feature and a second image feature;

an image processing module 102, configured to obtain, based on the first image feature and the second image feature, an image processing result for at least one of the two-phase images and a change detection result for the target in the two-phase image, which are output by the target change detection network.

In some illustrated embodiments, the feature extraction module 101 is specifically configured to:

the image processing module 102 includes:

the first image processing submodule is used for carrying out image processing on the first image characteristic and/or the second image characteristic to obtain an image processing result;

and the second image processing submodule is used for performing feature fusion on the first image features and the second image features to obtain fusion features, and performing change detection on the target contained in the two-stage images based on the fusion features to obtain the target change detection result.

In some embodiments shown, the second image processing sub-module comprises:

and the fusion module is used for performing self-adaptive feature fusion on the first image feature and the second image feature to obtain a fusion feature.

In some illustrated embodiments, the fusion module is specifically configured to:

In some embodiments shown, the degree of match is characterized by a cost value; the fusion module is specifically configured to:

In some embodiments shown, the image processing comprises at least one of:

In some illustrated embodiments, the apparatus 100 further comprises:

the data enhancement module is used for enhancing data of at least one period of image in the two periods of images in at least one of the following modes:

In some illustrative embodiments, the target change detection network includes a first input and a second input; the image processing module 102 is specifically configured to:

In some embodiments, in the case that the target change detection network is a network to be trained, the apparatus 100 further includes:

a loss determining module, configured to obtain an image processing loss according to the image processing result of the at least one stage image and a corresponding image processing tag value, and obtain a change detection loss according to a change detection result of a target in the two stage images and a corresponding change detection tag value;

and the adjusting module is used for adjusting the network parameters of the target change detection network based on the image processing loss and the change detection loss.

In some illustrated embodiments, the adjusting module is specifically configured to:

The embodiment of the image processing apparatus shown in the present application can be applied to an electronic device. Accordingly, the present application discloses an electronic device, which may comprise: a processor.

A memory for storing processor-executable instructions.

Wherein the processor is configured to call the executable instructions stored in the memory to implement the image processing method as shown in any of the above embodiments.

Referring to fig. 11, fig. 11 is a hardware structure diagram of an electronic device shown in the present application.

As shown in fig. 11, the electronic device may include a processor for executing instructions, a network interface for making network connections, a memory for storing operation data for the processor, and a non-volatile memory for storing instructions corresponding to the image processing apparatus.

The embodiment of the image processing apparatus may be implemented by software, or may be implemented by hardware, or a combination of hardware and software. Taking a software implementation as an example, as a logical device, the device is formed by reading, by a processor of the electronic device where the device is located, a corresponding computer program instruction in the nonvolatile memory into the memory for operation. In terms of hardware, in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 11, the electronic device in which the apparatus is located in the embodiment may also include other hardware according to an actual function of the electronic device, which is not described again.

It is to be understood that, in order to increase the processing speed, the corresponding instructions of the image processing apparatus may also be directly stored in the memory, which is not limited herein.

The present application proposes a computer-readable storage medium storing a computer program for executing the image processing method as shown in any one of the above embodiments.

One skilled in the art will recognize that one or more embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

"and/or" in this application means having at least one of the two, for example, "a and/or B" may include three schemes: A. b, and "A and B".

The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the data processing apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.

Specific embodiments of the present application have been described above. Other embodiments are within the scope of the following claims. In some cases, the acts or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Embodiments of the subject matter and functional operations described in this application may be implemented in the following: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware that may include the structures disclosed in this application and their structural equivalents, or combinations of one or more of them. Embodiments of the subject matter described in this application can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by the data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows described above can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Computers suitable for executing computer programs may include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory and/or a random access memory. The basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Moreover, a computer may be embedded in another device, e.g., a mobile telephone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer-readable media suitable for storing computer program instructions and data can include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

Although this application contains many specific implementation details, these should not be construed as limiting the scope of any disclosure or of what may be claimed, but rather as merely describing features of particular disclosed embodiments. Certain features that are described in this application in the context of separate embodiments can also be implemented in combination in a single embodiment. In other instances, features described in connection with one embodiment may be implemented as discrete components or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Further, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The above description is only for the purpose of illustrating the preferred embodiments of the present application and is not intended to limit the present application to the particular embodiments of the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principles of the present application should be included within the scope of the present application.

Claims

1. An image processing method, characterized in that the method comprises:

and obtaining an image processing result aiming at least one stage of image in the two stages of images and a change detection result aiming at the target in the two stages of images, which are output by the target change detection network, based on the first image characteristic and the second image characteristic.

2. The method according to claim 1, wherein the performing feature extraction on the two-phase image input to the change detection network to obtain a first image feature and a second image feature comprises:

respectively extracting features of the two-stage images to obtain a first image feature corresponding to a first image in the two-stage images and a second image feature corresponding to a second image in the two-stage images;

the obtaining, based on the first image feature and the second image feature, an image processing result for at least one of the images in the two phases and a target change detection result for the image in the two phases output by the target change detection network includes:

and performing feature fusion on the first image features and the second image features to obtain fusion features, and performing change detection on the target contained in the two-stage images based on the fusion features to obtain a target change detection result.

3. The method of claim 2, wherein the feature fusing the first image feature and the second image feature to obtain a fused feature comprises:

and performing self-adaptive feature fusion on the first image feature and the second image feature to obtain a fusion feature.

4. The method of claim 3, wherein the adaptively feature fusing the first image feature and the second image feature to obtain a fused feature comprises:

predicting a first offset corresponding to each feature point in the second image feature and a second offset corresponding to each feature point in the first image feature using the first image feature and the second image feature; the first offset is used for adjusting the receptive field of each feature point in the second image feature; the second offset is used for adjusting the receptive field of each feature point in the first image feature;

5. The method of claim 4, wherein the predicting, using the first offset for each feature point in the second image feature and the second offset for each feature point in the first image feature, the first image feature comprises:

overlapping the first image characteristic and the second image characteristic to obtain a first overlapping characteristic; performing offset prediction on the first superposition characteristic to obtain a first offset; and the number of the first and second groups,

overlapping the first image characteristic and the second image characteristic to obtain a second overlapping characteristic; and performing convolution on the second superposition characteristic to obtain the second offset.

6. The method according to claim 4 or 5, wherein the performing feature fusion on the adjusted first image feature and the adjusted second image feature to obtain the fused feature comprises:

performing optical flow prediction on the adjusted first image characteristic and the adjusted second image characteristic, and determining a corresponding optical flow field; the optical flow field represents the position error of the same pixel point in the two-stage images;

7. The method of claim 6, wherein performing optical flow prediction on the adjusted first image feature and the adjusted second image feature to determine a corresponding optical flow field comprises:

8. The method of claim 3, wherein the adaptively feature fusing the first image feature and the second image feature to obtain a fused feature comprises:

performing parallax feature extraction on the first image feature and the second image feature to obtain a parallax feature; wherein the parallax features characterize a degree of matching between feature points of the first image features and feature points in the second image features;

9. The method of claim 8, wherein determining weight information based on the disparity feature comprises:

10. The method according to claim 8 or 9, characterized in that the method further comprises:

11. The method according to any one of claims 8-10, wherein said degree of match is characterized by a cost value;

determining a cost space based on the first image feature and the second image feature; wherein the cost space is used for characterizing cost values between feature points of the first image feature and feature points in the second image feature;

12. The method of claim 3, wherein the adaptively feature fusing the first image feature and the second image feature to obtain a fused feature comprises:

sequentially determining each feature point in the first image feature as a current feature point, and executing: determining each feature point included in the second image feature, and the similarity between each feature point and the current feature point, and forming first weight information corresponding to the current feature point; wherein, the similarity between each feature point in the second image feature and the current feature point is characterized in the first weight information;

and sequentially determining each feature point in the second image feature as a current feature point, and executing: determining each feature point included in the first image feature, respectively determining the similarity between each feature point and the current feature point, and forming second weight information corresponding to the current feature point; the second weight information represents the similarity between each feature point in the first image feature and the current feature point;

and overlapping the aggregated first image characteristic and the second image characteristic to obtain the fusion characteristic.

13. The method according to claim 12, wherein the determining the similarity between each feature point included in the second image feature and the current feature point comprises:

determining a second neighborhood corresponding to each feature point included in the second image feature and the similarity between the second neighborhood corresponding to the current feature point and the first neighborhood; the first neighborhood comprises a neighborhood with a first preset size, which is formed by taking the current feature point as a center and other feature points around the current feature point; the second neighborhood region includes a neighborhood region of a second preset size which is formed by taking the second image feature point as a center and other feature points around the second image feature point.

14. The method of claim 3, wherein the adaptively feature fusing the first image feature and the second image feature to obtain a fused feature comprises:

sequentially determining each feature point in the first image feature as a current feature point, and executing: determining each feature point included in a third neighborhood corresponding to the feature point with the same position as the current feature point in the second image feature, respectively determining the similarity between each feature point and the current feature point, and forming third weight information corresponding to the current feature point; wherein the third weight information characterizes similarity between each feature point in the third neighborhood and the current feature point; the third neighborhood comprises a neighborhood with a third preset size, which is formed by taking a feature point in the second image feature, which is the same as the current feature point in position, as a center and other feature points around the feature point;

and sequentially determining each feature point in the second image feature as a current feature point, and executing: determining each feature point included in a fourth neighborhood corresponding to the feature point with the same position as the current feature point in the first image feature, respectively determining the similarity between each feature point and the current feature point, and forming fourth weight information corresponding to the current feature point; wherein the fourth weight information characterizes similarity between each feature point in the fourth neighborhood and the current feature point; the fourth neighborhood comprises a neighborhood with a fourth preset size, which is formed by taking a feature point in the first image feature, which is the same as the current feature point in position, as a center and other feature points around the feature point;

15. The method according to any of claims 1-14, wherein the image processing comprises at least one of:

16. The method according to any one of claims 1-15, further comprising:

performing data enhancement on at least one phase of image in the two phases of images by at least one of the following modes:

17. The method of any of claims 1-16, wherein the target change detection network comprises a first input and a second input;

the obtaining, based on the first image feature and the second image feature, an image processing result for at least one of the images in the two phases and a change detection result for a target in the image in the two phases, which are output by the target change detection network, includes:

the first image feature is used as a first input, the second image feature is used as a second input, and a first image processing result output by the target change detection network and aiming at least one stage of image in the two stages of images and a first change detection result aiming at a target in the two stages of images are obtained;

the first image characteristics are used as second input, and second image processing results output by the target change detection network and aiming at least one stage of images in the two stages of images and second change detection results aiming at targets in the two stages of images are obtained by using the second image characteristics as first input;

and carrying out weighted average on the first image processing result and the second image processing result to obtain a final image processing result aiming at least one stage of image in the two-stage image, and carrying out weighted average on the first change detection result and the second change detection result to obtain a final change detection result aiming at the target in the two-stage image.

18. The method according to any of claims 1-17, wherein in case the target change detection network is a network to be trained, the method further comprises:

obtaining image processing loss according to the image processing result of the at least one stage of image and the corresponding image processing tag value, and obtaining change detection loss according to the change detection result of the target in the two stages of images and the corresponding change detection tag value;

adjusting network parameters of the target change detection network based on the image processing loss and the change detection loss.

19. The method of claim 18, wherein said adjusting network parameters of the target change detection network based on the image processing penalty and the change detection penalty comprises:

obtaining total loss according to the image processing loss and the change detection loss;

adjusting a network parameter of the target change detection network based on the total loss.

20. An image processing apparatus, characterized in that the apparatus comprises:

21. An electronic device, characterized in that the device comprises:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to invoke executable instructions stored in the memory to implement the image processing method of any of claims 1-19.

22. A computer-readable storage medium, characterized in that the storage medium stores a computer program for executing the image processing method of any one of claims 1 to 19.