CN117671432B

CN117671432B - Method and device for training change analysis model, electronic equipment and storage medium

Info

Publication number: CN117671432B
Application number: CN202410134211.8A
Authority: CN
Inventors: 张开华; 梁玲燕; 赵雅倩; 董刚
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2024-01-31
Filing date: 2024-01-31
Publication date: 2024-05-07
Anticipated expiration: 2044-01-31
Also published as: CN117671432A

Abstract

The invention provides a method, a device, electronic equipment and a storage medium for training a change analysis model, which relate to the technical field of data processing and comprise the following steps: taking each change sample image pair carrying a change text and an image change label of the change sample image pair as a training sample to obtain a plurality of training samples, wherein each change sample image pair comprises a sample image before change and a sample image after change aiming at the same area; for any training sample, inputting a change sample image pair carrying a change text in the training sample into a preset change analysis network, and outputting an image change detection result corresponding to the training sample; and calculating a loss value based on an image change detection result corresponding to the training sample and the image change label, and completing training of the preset change analysis network under the condition that the loss value is smaller than a first preset threshold value to obtain a trained change analysis model.

Description

Method and device for training change analysis model, electronic equipment and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method and apparatus for training a change analysis model, an electronic device, and a storage medium.

Background

Remote sensing change detection is a technique aimed at identifying significant change differences between remote sensing images at different times and at the same location.

The change detection is a task with strong task relevance, however, the surface content is complex, many similar but irrelevant objects can exist in the same area or at the same moment, such as containers and buildings for transporting goods, which are often in construction engineering, and the satellite captured overhead view loses the original optical information, such as geometric shape, of too many objects with large reality gaps, which are similar in appearance in images, and the accuracy of the change detection is greatly affected.

Therefore, how to improve the accuracy of the change detection has become a problem to be solved in the industry.

Disclosure of Invention

The invention provides a method and a device for training a change analysis model, electronic equipment and a storage medium, which are used for solving the problem of how to improve the accuracy of change detection in the prior art.

The invention provides a variation analysis model training method, which comprises the following steps:

taking each change sample image pair carrying a change text and an image change label of the change sample image pair as a training sample to obtain a plurality of training samples, wherein each change sample image pair comprises a sample image before change and a sample image after change aiming at the same area;

For any training sample, inputting a change sample image pair carrying a change text in the training sample into a preset change analysis network, and outputting an image change detection result corresponding to the training sample;

Calculating a loss value based on an image change detection result corresponding to the training sample and the image change label, and completing training of the preset change analysis network under the condition that the loss value is smaller than a first preset threshold value to obtain a trained change analysis model;

The trained change analysis model is used for outputting an image change detection result of the category corresponding to the change text in the change image pair according to the input change text and the change image pair.

According to the method for training the change analysis model provided by the invention, a change sample image pair carrying a change text in the training sample is input into a preset change analysis network, and an image change detection result corresponding to the training sample is output, and the method comprises the following steps:

Inputting the change sample image pairs into a feature extraction module in the preset change analysis network, and outputting sample image features before change and sample image features after change with different sizes;

performing text image feature fusion processing on the change sample image pair and the change text through a segmentation cut model in the preset change analysis network to obtain a priori mask before change and a priori mask after change of different sizes;

performing fusion processing on the prior mask before the change and the prior mask after the change, as well as the sample image characteristics before the change and the sample image characteristics after the change of different sizes through a mask attention module in the preset change analysis network, and determining multi-stage local double-time characteristics before the change and after the change;

Carrying out multi-scale feature fusion on the multi-stage local double-temporal features before and after the change and the sample image features before and after the change with different sizes through a multi-scale feature fusion module in the preset change analysis network to obtain the image features before the change, the semantic features before the change, the image features after the change and the semantic features after the change;

And performing segmentation decoding on the pre-change image features, the pre-change semantic features, the post-change image features and the post-change semantic features after splicing and feature fusion by a decoding module in the preset change analysis network, and outputting an image change detection result corresponding to the training sample.

According to the method for training the change analysis model provided by the invention, the pair of change sample images is input into the feature extraction module in the preset change analysis network, and the sample image features before and after the change with different sizes are output, and the method comprises the following steps:

Respectively inputting the sample image before change and the sample image after change into feature extraction branches of two sharing parameters in the feature extraction module;

and respectively outputting the sample image features before and the sample image features after the change with different sizes and with the resolution from high to low through the feature extraction of a plurality of stages of the two feature extraction branches.

According to the method for training the change analysis model provided by the invention, the step of carrying out text image feature fusion processing on the change sample image pair and the change text by dividing all models in the preset change analysis network to obtain the prior mask before change and the prior mask after change with different sizes comprises the following steps:

Processing the image pair of the change sample and the change text through the segmentation incision model, and outputting a prior mask before change and a prior mask after change;

performing multi-scale downsampling processing on the prior mask before change and the prior mask after change to obtain prior masks before change and prior masks after change with different sizes;

the downsampling proportion of the multi-scale downsampling process is consistent with that of the feature extraction module.

According to the method for training the change analysis model provided by the invention, the change sample image pair and the change text are processed through the segmentation all models, and the prior mask before change and the prior mask after change are output, and the method comprises the following steps:

respectively inputting a sample image before and a sample image after the change in the sample image pair to a picture feature encoder in the segmentation model, and outputting sample image coding features before and sample image coding features after the change;

Coding the change text into text prompt codes through a text coder in the segmentation all model;

And respectively combining the text prompt codes with the sample image coding features before the change and the sample image coding features after the change, inputting the combined codes into a fusion decoder in the segmentation all model, and respectively outputting a priori mask before the change and a priori mask after the change.

According to the method for training the change analysis model provided by the invention, the mask attention module in the preset change analysis network is used for carrying out fusion processing on the prior mask before change and the prior mask after change, as well as the sample image characteristics before change and the sample image characteristics after change, which are different in size, so as to determine the multi-stage local double-time characteristics before change and after change, and the method comprises the following steps:

Performing linear mapping on the sample image characteristics before the change and the sample image characteristics after the change to obtain a query value, a key value and a numerical value corresponding to a mask attention mechanism;

The query value, the key value and the numerical value are fused with the prior mask before the change and the prior mask after the change through a mask attention mechanism, so that fusion characteristics are obtained;

and (3) carrying out enhancement processing on the fusion characteristic through a self-attention mechanism, and outputting the multi-stage local double-time characteristic before and after the change.

According to the method for training the change analysis model provided by the invention, the multi-scale feature fusion module in the preset change analysis network is used for carrying out multi-scale feature fusion on the multi-stage local double-time features before and after the change and the sample image features before and after the change with different sizes to obtain the image features before the change, the semantic features before the change, the image features after the change and the semantic features after the change, and the method comprises the following steps:

arranging the pre-change and post-change multi-stage local double-time features and the pre-change sample image features and the post-change sample image features with different sizes from high to low according to resolution;

And (3) carrying out convolution, normalization, activation and up-sampling on the multi-stage local double-time features before and after the change of the lowest resolution, the sample image features before and after the change, and the sample image features after the change, and then carrying out convolution, normalization, activation and up-sampling until the features of all resolutions are traversed, outputting the image features before the change, the semantic features before the change, the image features after the change and the semantic features after the change.

According to the method for training the change analysis model provided by the invention, the decoding module in the preset change analysis network is used for splicing and feature fusion of the image features before change, the semantic features before change, the image features after change and the semantic features after change, and then segmentation decoding is carried out, and the image change detection result corresponding to the training sample is output, which specifically comprises the following steps:

Splicing the image features before change, the semantic features before change, the image features after change and the semantic features after change in channel dimension, then upsampling, and compressing channel dimension of upsampling results through a full connection layer to obtain comprehensive features;

and inputting the comprehensive characteristics into a decoding head in a decoding module, and outputting an image change detection result corresponding to the training sample.

According to the method for training the change analysis model provided by the invention, the method for acquiring the text prompt codes comprises the following steps:

Wherein, For the segmentation of text encoders in a tangential model,Representing text prompt encodings,Representing the entered variant text.

According to the method for training the change analysis model provided by the invention, the method for acquiring the sample image characteristics before and after the change comprises the following steps:

Wherein the method comprises the steps of For the segmentation of picture feature encoders in all models,For sample image features,To change sample image,，Indicating the pre-change time and the post-change time, respectively.

According to the training method of the change analysis model provided by the invention, the prior mask before change and the prior mask after change are acquired, which concretely comprises the following steps:

Wherein, Representing an a priori mask,For sample image features,A text prompt encoding is presented that is representative of,To segment the fusion decoder in all models.

According to the method for training the change analysis model provided by the invention, the method for calculating the prior mask specifically comprises the following steps:

Wherein, Representing an a priori mask,Representing the prior mask before the change and the prior mask after the change, respectively,Representing segmentation of all large models,Representing an input sample image,，Representing the entered variant text; /(I)Representing the pre-change sample image features and the post-change sample image features, respectively.

According to the method for training the change analysis model provided by the invention, the method for calculating the query value, the key value and the numerical value specifically comprises the following steps:

，

Wherein, For query value,Is a key value,Is a numerical valueFor post-change sample image features,For pre-change sample image features,For the image characteristics of the sample before and after the change,，，Respectively representing three different linear mapping layers,。

According to the variation analysis model training method provided by the invention, the mask attention mechanism is specifically used for:

；

Wherein, Representing normalization layer,Representing a mask attention mechanism,Representing an a priori mask.

According to the variation analysis model training method provided by the invention, the calculation method of the comprehensive characteristics specifically comprises the following steps:

；

Wherein, Respectively, pre-change image features, post-change image features, pre-change semantic features and post-change semantic features,Representing the number of characteristic channels,Representing a fully connected layer,Representing upsampling,Representing a stitching operation.

According to the method for training the change analysis model provided by the invention, the image change detection result corresponding to the training sample specifically comprises the following steps:

Wherein, Representing a linear mapping layer,As a comprehensive feature,The image change detection result is obtained.

According to the method for training the change analysis model provided by the invention, after the step of training the preset change analysis network to obtain the trained change analysis model is completed, the method further comprises the following steps:

Acquiring a change image pair of the same region based on the pre-change image and the post-change image of the same region at different times;

and acquiring a change text of the change image pair, inputting the change text and the change image pair into the trained change analysis model, and outputting an image change detection result of the change image pair.

The invention also provides a device for training the change analysis model, which comprises:

The acquisition module is used for taking each change sample image pair carrying a change text and an image change label of the change sample image pair as a training sample to acquire a plurality of training samples, wherein each change sample image pair comprises a sample image before change and a sample image after change aiming at the same area;

the analysis module is used for inputting a change sample image pair carrying a change text in the training sample into a preset change analysis network for any training sample, and outputting an image change detection result corresponding to the training sample;

The training module is used for calculating a loss value based on an image change detection result corresponding to the training sample and the image change label, and completing training of the preset change analysis network under the condition that the loss value is smaller than a first preset threshold value to obtain a trained change analysis model;

According to the device for training the change analysis model provided by the invention, the device is further used for:

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of training the change analysis model as described in any one of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of training a variation analysis model as described in any of the above.

The invention also provides a computer program product comprising a computer program which when executed by a processor implements a method of training a variation analysis model as described in any of the above.

According to the change analysis model training method, the device, the electronic equipment and the storage medium, the change text and the change sample image pair are introduced to participate in model training in the change analysis model training process for change detection, and the change image type is guided in the model training process effectively through the introduction of the change text, so that the change analysis model can accurately identify the image change of the corresponding type of the change text, the detection accuracy of the change detection model is improved, and the error detection of the related change object of the non-text corresponding type is reduced.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a training method of a change analysis model in an embodiment of the application;

FIG. 2 is a schematic diagram of a model training process according to an embodiment of the present application;

FIG. 3 is a schematic diagram of fusion provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a mask attention module according to an embodiment of the present application;

FIG. 5 is a global feature fusion decoding schematic diagram according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a training device for a change analysis model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a schematic flow chart of a training method of a change analysis model according to an embodiment of the present application, as shown in fig. 1, including:

Step 110, taking each change sample image pair carrying a change text and an image change label of the change sample image pair as a training sample, and obtaining a plurality of training samples, wherein each change sample image pair comprises a sample image before change and a sample image after change aiming at the same area;

The change text described in the embodiment of the application can be a text corresponding to a change category in an image, for example, a building change, a road change, a woodland change, a farmland change, a lake change, and the like.

The pair of changed sample images described in the embodiment of the application specifically refers to two changed sample images acquired by the same area in different time periods, wherein the two changed sample images comprise a sample image before change and a sample image after change.

In the embodiment of the application, the sample image before the change is a sample image generated earlier than the sample image after the change, and the change condition of the area can be effectively determined by comparing the sample image before the change with the sample image after the change.

In an alternative embodiment, the pre-change sample image and the post-change sample image may specifically be remote sensing images, such as satellite remote sensing images, i.e. images generated for the same area by remote sensing technology, more specifically, the pre-change sample image and the post-change sample image may comprise building images, road images, tillage images, etc.

In the embodiment of the application, the image change label may be specifically a binarized image, where the changed area may be highlighted.

In an alternative embodiment, a large number of sample image pairs may be collected, with images of the same region at different points in time. Each of these image pairs should be marked to indicate which regions have changed and which have not. The image can be preprocessed, such as size adjustment, clipping, gray level conversion, normalization and the like, so that training efficiency and accuracy of the model are improved.

In an alternative embodiment, for each sample image pair, a corresponding image change label may also be generated that highlights the change area. These labels are typically binary, with the change region indicated by 1 and the unchanged region indicated by 0, and thus a label corresponding to each change sample image pair is obtained, generating a plurality of training samples.

Step 120, for any training sample, inputting a pair of change sample images carrying a change text in the training sample into a preset change analysis network, and outputting an image change detection result corresponding to the training sample;

In the embodiment of the application, the preset change analysis network specifically may include a feature extraction module, a segmentation all model, a mask attention module, a multi-scale feature fusion module and a decoding module.

In the embodiment of the application, fig. 2 is a schematic diagram of a model training flow provided in the embodiment of the application, as shown in fig. 2, a change sample image pair carrying a change text, that is, after a sample image before and a sample image after a change are input into a preset change analysis network, feature extraction can be performed on the change sample image pair through a feature extraction module, so as to obtain sample image features before and sample image features after the change with different resolution sizes; feature fusion can be carried out on the changed text and the changed sample image pair by dividing all models, and a prior mask before the change and a prior mask after the change, which are fused with text semantic features, are obtained.

And then, the prior mask and the sample image features are fused through a mask attention module and a multi-scale feature fusion module, and then input into a decoding module to obtain an image change detection result corresponding to the training sample.

Step 130, calculating a loss value based on an image change detection result corresponding to the training sample and the image change label, and completing training of the preset change analysis network under the condition that the loss value is smaller than a first preset threshold value to obtain a trained change analysis model;

In an embodiment of the application, a loss function may be used to measure the difference between the model predictive result and the actual label. In processing the task of image change detection, common loss functions include Mean Square Error (MSE), cross entropy loss (Cross-Entropy Loss), or other loss functions suitable for regression or classification tasks. The method can be specifically set according to the actual demands of users, and in the embodiment of the application, cross entropy loss (Cross-Entropy Loss) can be selected as a loss function.

In the training process, the input change sample image pair is processed through a network to obtain the output of the model.

Comparing the model output with the image change label, and calculating a loss value by using a defined loss function. The smaller the loss value, the closer the prediction of the model is to the real label.

If the calculated loss value is greater than the preset threshold, the gradient is calculated by a back propagation algorithm, and the weight of the network is updated by using an optimization algorithm (such as SGD, adam, etc.), so as to reduce the loss value.

Repeating the process through a plurality of training samples, and updating the network weight for each iteration until the loss value reaches or is lower than a first preset threshold value, and stopping training to obtain a trained change analysis model.

In the embodiment of the application, a trained change analysis model is applied to an actual scene, a change text and a change image pair are input, and the model outputs an image change detection result of a category corresponding to the change text in the change image pair.

According to the embodiment of the application, the change text and the change sample image pair are introduced into the training process of the change analysis model for change detection to participate in model training, and the guide of the change image type in the model training process is effectively guided through the introduction of the change text, so that the change analysis model can more accurately identify the image change of the corresponding type of the change text, the detection accuracy of the change detection model is improved, and the error detection of the related change object of the non-text corresponding type is reduced.

Optionally, inputting the pair of change sample images carrying the change text in the training sample into a preset change analysis network, and outputting an image change detection result corresponding to the training sample, including:

In the embodiment of the application, the proper change text can be selected according to the types of different data sets, for example, the change text such as building can be used for the image data set of the change sample of the building, the change text such as a lake can be used for the image data set of the change sample of the lake, and the model can be assisted by the change text to accurately detect the change of the corresponding type of the change text from the image pair.

Optionally, inputting the pair of changed sample images into a feature extraction module in the preset change analysis network, outputting sample image features before and sample image features after the change with different sizes, including:

In the embodiment of the application, the feature extraction module may be specifically a twin feature extraction network including two branches, where the network is composed of two identical branches, each branch processes an input image, and the two branches share parameters.

Each branch contains multiple stages, each consisting of a convolutional layer, a normalization layer, and an activation function, typically constructed using a transducer block.

The output of each branch is a feature map of different resolution, ranging from high to low resolution, typically halving the resolution per stage, and multi-scale features are extracted by the branches.

In an alternative embodiment, the feature extraction module may include a four-stage twin feature extraction network, and input the pre-change sample image and the post-change sample image into feature extraction branches of two shared parameters respectively, and output the features of the pre-change sample image with multiple different sizes from high resolution to low resolution respectively after the feature extraction of four stages respectively，，，And post-change sample image features，，，。

Optionally, the step of performing text image feature fusion processing on the change sample image pair and the change text by dividing all models in the preset change analysis network to obtain a priori mask before change and a priori mask after change with different sizes includes:

In an embodiment of the application, a cut Model (SEGMENT ANYTHING Model, SAM) is segmented and the text prompt encoder is used to encode the entered changing text (e.g. "building") into a text prompt code.

Meanwhile, an image encoder encodes an input image into image encoding features, the text prompt encoding and the image encoding features are connected and then input into a SAM decoder, and the decoder fuses the two feature vectors, so that a priori mask M with semantic information is generated.

In this way, the SAM successfully passes semantic information in the changed text to the generated prior mask M, resulting in a prior mask before the changeAnd a changed prior maskFusion of the change text and the image is realized.

In the embodiment of the application, in order to perform resolution on the image, the image is convenient to process, and the prior mask can be further usedAndAnd respectively performing multi-scale downsampling, wherein the downsampling proportion of the multi-scale downsampling process is kept constant with the proportion of the sample image before and the sample image after the change in the characteristic extraction module, and finally, the prior mask before and the prior mask after the change with different sizes, which are consistent with the characteristic resolution of the sample image before and the sample image after the change, are obtained.

In an alternative embodiment, the a priori mask may be further maskedAndRespectively performing four multi-scale downsampling to obtain a priori mask/>, which is consistent with the resolution of the sample image features before and after the changeAnd。

Optionally, processing the pair of changed sample images and the changed text by the segmentation cut model, outputting a prior mask before the change and a prior mask after the change, including:

In the embodiment of the application, the text prompt codes are obtained by dividing the variable text input into text encoders in all modelsThe text encoder converts the prior text from a computer-unprocessed form to a computer-processed form, compresses the varying text information into a feature vector, and has the following specific formula:

In the embodiment of the application, the sample image before the change and the sample image after the change are respectively input into the picture feature encoder in the segmentation all model to convert the image into the image coding feature。

Optionally, the method for acquiring the sample image features before and after the change includes:

In the embodiment of the present application, the text prompt code may be further combined with the sample image coding feature before the change and the sample image coding feature after the change, and sent to a decoder to obtain a priori mask with text semantic class information, specifically:

Wherein, Representing the prior mask before the change and the prior mask after the change,For the features of the sample image,Representing text prompt encodings,To segment the fusion decoder in all models.

In an alternative embodiment, the process of splitting a cut model may be reduced to the following formula:

Wherein the method comprises the steps of Representing segmentation of all large models,Representing an input image, wherein，Since the input of the change detection is a double-time image, the double-time image can be processed in parallel to acquire the double-time image with the corresponding prior mask, which can be expressed as: /(I)

Wherein the method comprises the steps ofA priori mask corresponding to the pre-change and post-change images, respectively,/>, respectivelyA pre-change sample image and a post-change sample image, respectively.

According to the embodiment of the application, the text and the image features can be effectively fused by dividing all models, and the guide of the type of the changed image in the model training process can be effectively guided by introducing the changed text, so that the change analysis model can more accurately identify the image change of the type corresponding to the changed text, and the change identification accuracy of the model for a specific type is improved.

Optionally, the step of determining the multi-stage local dual-temporal features before and after the change by performing fusion processing on the prior mask before the change and the prior mask after the change, and the sample image features before the change and the sample image features after the change, which are different in size, through a mask attention module in the preset change analysis network includes:

In an alternative embodiment, fig. 3 is a schematic diagram of fusion provided in an embodiment of the present application, where, as shown in fig. 3, a variable text and a variable sample image pair are fused, and the fusion mainly includes a mask attention module and a multi-stage feature fusion module, where the mask attention module is mainly used to fuse the variable text and the variable sample image pair, and the multi-stage feature fusion module is mainly used to fuse multi-stage fusion information to obtain rich features fused by coarse granularity and fine granularity.

Fig. 4 is a schematic diagram of a mask attention module provided by the embodiment of the present application, as shown in fig. 4, the sample image feature before the change and the sample image feature after the change are mapped linearly, and corresponding Query, key and Value values are obtained through linear mapping, where a specific formula is as follows:

，

In the embodiment of the application, the query value, the key value and the numerical value can be fused with the prior mask before the change and the prior mask after the change by introducing a mask attention mechanism to obtain the fusion characteristic, and the specific formula is as follows:

,

Wherein the method comprises the steps of Representing normalization layer,Representing a mask attention mechanism,Representing a multi-stage a priori mask.

The fusion characteristics are subjected to long-range modeling through a self-attention mechanism, the expression capacity of the characteristics is enhanced, and the local characteristics with semantic information and strong expression capacity are obtained, wherein the specific formula is as follows:

Wherein the method comprises the steps of Representing fusion features,Representing self-attention-enhancing local features,A linear mapping layer is represented and is shown,Representing normalization layer,Representing the fully connected layer, and finally obtaining the multi-stage local double-time characteristics before and after the change。

Optionally, the multi-scale feature fusion module in the preset change analysis network performs multi-scale feature fusion on the multi-stage local dual-temporal features before and after the change and the sample image features before and after the change with different sizes to obtain an image feature before the change, a semantic feature before the change, an image feature after the change and a semantic feature after the change, including:

In the embodiment of the application, the multi-stage features can be ordered according to the resolution, the multi-stage local double-time features before and after the change of the lowest resolution, and the sample image features before and after the change are fused in the first stage, specifically, convolution, normalization, activation, up-sampling and other processes are performed for fusion.

Then, splicing the multi-stage local double-time features before and after the change of the secondary low resolution, the sample image features before and after the change and the features fused in the first stage, and fusing again after splicing until the features with all the size resolutions are traversed, and outputting the image features before the change, the semantic features before the change, the image features after the change and the semantic features after the change.

In an alternative embodiment, a multi-stage feature is usedOrdering according to the resolution ratio, and parallelly entering into first stage fusion, wherein first stage fusion input isAndFusion to give；

Will beOverrolling operation, up sampling after normalization operation through activation functionIs of uniform size and consistent withThe result after 1x1 convolution is spliced and fused to obtain. Wherein the 3x3 convolution and the 1x1 convolution operate as follows: /(I)

Wherein the method comprises the steps ofIntermediate state characteristic representation of respectively fusing two branches,The number of channels representing the feature vector is indicated,Representing a splice operation,Representing a convolution with a convolution kernel size of 3x3,A convolution with 1x1 is shown.

Traversing all resolution features willRepeating the above-described fusion operation, and then summingFusion is carried out to obtain global characteristics/>, before changePost-change global featuresAnd pre-transform local featuresPost-change local features。

Optionally, the step of splitting and decoding the image feature before the change, the semantic feature before the change, the image feature after the change and the semantic feature after the change by using a decoding module in the preset change analysis network and outputting an image change detection result corresponding to the training sample specifically includes:

In an embodiment of the present application, fig. 5 is a global feature fusion decoding schematic diagram provided in the embodiment of the present application, as shown in fig. 5, including: and fusing the global features and the local features before the change with the global features and the local features after the change, and comparing the global features before the change and the local features after the change to discover the position of the change area.

Pre-change global featuresPre-change local featuresAnd post-change global featurePost-change local featuresAnd (3) up-sampling after splicing in the channel dimension, and compressing the channel dimension of the result through the full-connection layer to obtain the final comprehensive characteristics, wherein the specific formula is as follows:

Wherein the method comprises the steps of Representing the number of characteristic channels,Representing a fully connected layer,Representing upsampling,Representing a stitching operation.

In the embodiment of the application, after the comprehensive characteristics are obtained, the comprehensive characteristics can be sent to a solution terminal to obtain a final image change detection resultThe specific formula is as follows:

Wherein the method comprises the steps of Represented as a split solution dock.

In the embodiment of the present application, the image change detection result may specifically be a binarized image.

In an alternative embodiment, a remote sensing image change detection dataset containing dual-time images is obtained, a proper text prompt is selected according to the category of the dataset, for example, the building change detection dataset can be used for text prior information such as building, and a segmentation model is segmented through the text prompt to generate a segmentation result of the dual-time images corresponding to the text category in the dataset，. And secondly, executing the following steps 1 to 6 to obtain a built remote sensing image change detection model. Finally, the remote sensing image change detection model is applied to obtain the detection result/>, of the change target。

Step 1: a twin feature extraction network is constructed that includes four stages. Respectively inputting the remote sensing image before change and the remote sensing image after change into two feature extraction branches with shared parameters, respectively extracting four-stage features, and respectively outputting the image features before change with resolution from high to lowAnd post-change image features；

Step 2: a priori mask prompting module of the text prompting is constructed. The optical RGB images before and after transformation are respectively sent into an image encoder for dividing a everything model (SAM) to obtain the characteristics of the optical RGB images before and after transformationAndSelecting text prompts according to specific data sets and encoding the text prompts into text prompts through a prompt encoder respectivelyThen, the corresponding combination of the image encoder result and the prompt encoder result is sent into a segmentation all model decoder to obtain the prior mask/>, before and after the change, respectivelyAnd；

Step 3: constructing a priori mask module, and masking the priori mask obtained in the step 2AndFour multi-scale downsampling is respectively carried out, and the downsampling proportion is consistent with that of downsampling the optical RGB image in the step 1, so that two groups of RGB features output from the step 1 can be obtainedAndResolution-consistent a priori maskAnd。

Step 4: a mask attention module is constructed. The image before change obtained in the step 1 is characterizedAnd post-change image featureObtaining new change characteristics/>, by linear transformation after corresponding characteristic difference before and after changeIn addition, the image features before change are respectively mapped in two different linearity modes to obtain a Query Value and a Key Value corresponding to the attention mechanism, the image features after change are also respectively mapped in two different linearity modes to obtain a corresponding Query Value and a Key Value, mask attention is introduced, mask attention calculation is respectively carried out on known Query, key and Value values and prior mask values, self attention calculation is carried out, and final features with semantic information embedding are obtained through full-connection layer mapping, so that the features before change/>, are respectively obtainedAnd post-change features。

Step 5: constructing a multi-scale feature fusion module, and combining the features before and after the change in the step 3AndAnd the pre-image change feature in step 1And post-image-transform featuresThe method comprises the steps of performing multi-scale feature fusion, namely performing convolution- > normalization- > activation on low-resolution features, performing double upsampling, performing dimensional splicing on the low-resolution features and the features of the previous layer, and repeating convolution operation, so that multi-layer feature fusion is performed in a cascade manner, and finally, the pre-change image features/>, are obtained respectivelyPre-change semantic featuresAnd transformed image featureAnd post-change semantic feature。

Step 6: constructing a global feature and local feature fusion decoding mechanism, and obtaining the image features before transformation in the step EPre-change semantic featuresPost-change image featureAnd post-change semantic featureThe four features are spliced along the channel dimension, and then the feature channel dimension is changed from 4/>, using the full connection layerRecompression asAnd after finishing feature fusion, sending the fused features into a segmentation head to segment the related change targets after twice upsampling.

The change analysis model training device provided by the invention is described below, and the change analysis model training device described below and the change analysis model training method described above can be referred to correspondingly.

Fig. 6 is a schematic structural diagram of a training device for a change analysis model according to an embodiment of the present application, as shown in fig. 6, including:

The obtaining module 610 is configured to obtain a plurality of training samples by using each of a pair of change sample images carrying a change text and an image change label of the pair of change sample images as one training sample, where each pair of change sample images includes a sample image before change and a sample image after change for the same region;

the analysis module 620 is configured to input, for any one of the training samples, a pair of change sample images carrying a change text in the training sample into a preset change analysis network, and output an image change detection result corresponding to the training sample;

The training module 630 is configured to calculate a loss value based on an image change detection result corresponding to the training sample and the image change label, and complete training of the preset change analysis network to obtain a trained change analysis model when the loss value is smaller than a first preset threshold;

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application, as shown in fig. 7, the electronic device may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a change analysis model training method comprising: taking each change sample image pair carrying a change text and an image change label of the change sample image pair as a training sample to obtain a plurality of training samples, wherein each change sample image pair comprises a sample image before change and a sample image after change aiming at the same area;

Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing the method of training a change analysis model provided by the methods described above, the method comprising: taking each change sample image pair carrying a change text and an image change label of the change sample image pair as a training sample to obtain a plurality of training samples, wherein each change sample image pair comprises a sample image before change and a sample image after change aiming at the same area;

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a method of training a change analysis model provided by the above methods, the method comprising: taking each change sample image pair carrying a change text and an image change label of the change sample image pair as a training sample to obtain a plurality of training samples, wherein each change sample image pair comprises a sample image before change and a sample image after change aiming at the same area;

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of training a variation analysis model, comprising:

The trained change analysis model is used for outputting an image change detection result of a category corresponding to the change text in the change image pair according to the input change text and the change image pair;

the preset change analysis network specifically comprises a feature extraction module, a segmentation all model, a mask attention module, a multi-scale feature fusion module and a decoding module;

Inputting a change sample image pair carrying a change text in the training sample into a preset change analysis network, and outputting an image change detection result corresponding to the training sample, wherein the method comprises the following steps:

2. The method according to claim 1, wherein inputting the pairs of change sample images into the feature extraction module in the preset change analysis network outputs pre-change sample image features and post-change sample image features of different sizes, comprising:

3. The method for training a variation analysis model according to claim 1, wherein the step of performing text image feature fusion processing on the variation sample image pair and the variation text by dividing a cut model in the preset variation analysis network to obtain a priori mask before variation and a priori mask after variation with different sizes comprises:

4. A method of training a variant analysis model according to claim 3, wherein processing the variant sample image pair and the variant text by the segmentation of a cut model outputs a pre-variant prior mask and a post-variant prior mask, comprising:

5. The method for training a variation analysis model according to claim 1, wherein the step of determining the multi-stage local dual-temporal features before and after the variation by performing fusion processing on the prior mask before the variation and the prior mask after the variation, and the sample image features before the variation and the sample image features after the variation of different sizes by a mask attention module in the preset variation analysis network comprises:

6. The method for training a variation analysis model according to claim 1, wherein the multi-scale feature fusion module in the preset variation analysis network performs multi-scale feature fusion on the multi-stage local dual-temporal features before and after the variation and the sample image features before and after the variation with different sizes to obtain an image feature before the variation, a semantic feature before the variation, an image feature after the variation and a semantic feature after the variation, and includes:

7. The method for training a change analysis model according to claim 1, wherein the step of outputting the image change detection result corresponding to the training sample by performing segmentation decoding after performing stitching and feature fusion on the pre-change image feature, the pre-change semantic feature, the post-change image feature and the post-change semantic feature by a decoding module in the preset change analysis network specifically comprises:

8. The method for training a variation analysis model according to claim 4, wherein the method for acquiring text prompt codes specifically comprises:

；

9. The method for training a model for analysis of changes according to claim 4, wherein the method for acquiring the sample image features before and after changes comprises:

；

10. The method for training a variation analysis model according to claim 4, wherein the method for acquiring the prior mask before variation and the prior mask after variation specifically comprises:

；

Wherein, Representing an a priori mask,For sample image features,Representing text prompt encodings,To segment the fusion decoder in all models.

11. The method for training a variation analysis model according to claim 4, wherein the method for calculating the prior mask specifically comprises:

；

Wherein, Representing an a priori mask,Representing the prior mask before the change and the prior mask after the change respectively,Representing segmentation of all large models,Representing an input sample image,，Representing the entered variant text; representing the pre-change sample image features and the post-change sample image features, respectively.

12. The method for training a variation analysis model according to claim 5, wherein the calculation method for the query value, the key value and the numerical value specifically comprises:

；

Wherein, For query value,Is a key value,Is a numerical valueFor post-change sample image features,For pre-change sample image features,For the image characteristics of the sample before and after the change,，，Three different linear mapping layers are shown separately,。

13. The method for training a variant analysis model according to claim 12, wherein the mask attention mechanism is specifically configured to:

；

14. The method for training a variation analysis model according to claim 7, wherein the method for calculating the integrated feature specifically comprises:

；

Wherein, Respectively, pre-change image features, post-change image features, pre-change semantic features and post-change semantic features,Representing the number of characteristic channels,Representing a fully connected layer,The up-sampling is indicated as such,Representing a stitching operation.

15. The method for training a variation analysis model according to claim 14, wherein the image variation detection result corresponding to the training sample specifically includes:

；

16. The method for training a variation analysis model according to claim 1, further comprising, after the step of completing the training of the preset variation analysis network to obtain a trained variation analysis model:

17. A variation analysis model training apparatus, comprising:

the device is also for:

18. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the change analysis model training method of any of claims 1 to 16 when the program is executed by the processor.

19. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the variation analysis model training method of any of claims 1 to 16.