CN118096922A - Method for generating map based on style migration and remote sensing image - Google Patents

Method for generating map based on style migration and remote sensing image Download PDF

Info

Publication number
CN118096922A
CN118096922A CN202410299406.8A CN202410299406A CN118096922A CN 118096922 A CN118096922 A CN 118096922A CN 202410299406 A CN202410299406 A CN 202410299406A CN 118096922 A CN118096922 A CN 118096922A
Authority
CN
China
Prior art keywords
map
image
loss
remote sensing
sensing image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410299406.8A
Other languages
Chinese (zh)
Inventor
王奔
丁志鹏
孙水发
冯阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Normal University
Original Assignee
Hangzhou Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Normal University filed Critical Hangzhou Normal University
Priority to CN202410299406.8A priority Critical patent/CN118096922A/en
Publication of CN118096922A publication Critical patent/CN118096922A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for generating a map based on style migration and remote sensing images, wherein a constructed map generation network model comprises an encoder, a style conversion module and a decoder; capturing long-range dependency relations among map features by fusing a multi-head self-attention mechanism and a residual error module as a style converter; the method combines the traditional transpose convolution and Carafe operators in the up-sampling stage of the decoder to better utilize the neighborhood information to improve the up-sampling quality; and optimizing and training the map generation network model, inputting the remote sensing image into the trained optimal model, and outputting a corresponding map image. The invention provides more accurate characteristic information for the up-sampling operation, and the up-sampling operation is carried out through up-sampling kernel prediction and characteristic recombination, so that the generated map has good visual improvement effect on the aspects of roads, buildings, edge details, map content color saturation and the like. The problems of detail loss and unclear content in map generation in the prior art are solved.

Description

Method for generating map based on style migration and remote sensing image
Technical Field
The invention belongs to the technical field of map making, and relates to a method for generating a map based on style migration and remote sensing images.
Background
Maps play an important and indispensable role in the daily life and work of the masses. They provide not only spatial positioning and navigation functions, but also rich geographic information and spatial data resources. Conventional mapping methods typically rely on manual surveys and vehicle-mounted GPS track data, however, there are some inherent limitations to these methods in the map updating process. First, the conventional map making method requires a lot of human resources and time, resulting in a slow map update speed. Second, manual surveys may introduce human error, which in turn leads to differences in the map from the actual ground. In addition, the vehicle-mounted GPS trajectory data may be limited by the environment and equipment, and thus cannot reflect the real situation completely and accurately. Considering the frequent reformation of the current ground buildings and roads and the occurrence of natural disasters, the actual ground conditions are not matched with the existing map. Thus, there is a need for a map generation method that is both fast and accurate.
The appearance of the style migration idea is a solution to the above problem. In recent years, with the improvement of computing performance of hardware equipment and the rapid development of deep learning, the deep learning is widely used in various fields of computer vision, a network model based on the deep learning is continuously emerging, various algorithms are continuously optimized and upgraded, and great effects are achieved in practical application. Thanks to this, many scholars begin to use neural networks for style migration of images.
In recent years, many students have proposed a map generation method based on GAN network, such as: full supervision model Pix2Pix: the condition-based generation countermeasure network realizes one-to-one image style migration through the supervision training of paired images, but relies on paired data and even semantic tags. However, in many practical scenarios, it is very difficult to obtain accurate pairing data and tags; unsupervised model CycleGAN: the consistency and accuracy of the conversion are ensured by cycling the consistency loss function, and paired training data are not needed, but the actual generation effect is not very satisfactory due to lack of supervised training of the paired data; semi-supervised model SmapGAN: and based on a semi-supervised generation countermeasure network, style migration conversion between the regional remote sensing image and the map is realized. And designing the image gradient L1 loss and the image gradient structure loss to generate a stylized map block with a global topological relation and an object detailed edge curve, wherein the model has the advantages of a supervision model and an unsupervised model, and although the model has certain advantages, the processing of long-distance dependency relations still exists because a style converter based on Resblock mainly uses a local receptive field to capture the spatial local relation in input data. In addition, conventional transpose convolutions are affected by the local receptive field and padding of the convolution kernel, which can easily cause blurring and information loss during upsampling.
The defects of the supervision model, the non-supervision model and the semi-supervision model are based on the problems that map details are lost, the content is unclear and the like when the remote sensing image is generated on the map. Therefore, a method for generating a map based on style migration and remote sensing images is provided.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a method for generating a map based on style migration and remote sensing images.
The method specifically comprises the following steps:
Step 1: firstly, dividing any public remote sensing image data set into a training set, a verification set and a test set according to a set proportion. The dataset comprises a plurality of paired images, each of which contains a remote sensing image and a corresponding map image. Meanwhile, data enhancement processing is carried out on the training set and the verification set, and the data enhancement processing is not carried out on the test set. The data enhanced content includes flip and rotate operations. The training set aims at learning the characteristics, the verification set is used for assisting in debugging the model, and the test set is used for evaluating the precision of the model;
Step 2: constructing a map generation network model, wherein the model comprises an encoder, a style conversion module and a decoder;
The encoder comprises 1 convolution layer with the convolution kernel size of 7×7, a batch normalization layer, a ReLU activation function and 2 downsampling layers, wherein the downsampling layers comprise the convolution layer with the convolution kernel size of 3×3, the batch normalization layer and the ReLU activation function;
The style conversion module uses a residual block and a multi-head self-attention mechanism to be combined as a main structure of the style converter; the Feature map Feature R obtained after the encoder processing sequentially passes through 9 residual blocks composed of 3×3 convolution layers, a ReLU activation function, and 3×3 convolution layers to perform preliminary style conversion. At the last layer of the style converter, a1×1 convolution layer, a ReLU activation function, a 4-head self-attention mechanism module, a ReLU activation function, a1×1 convolution layer, and a ReLU activation function are sequentially used to output Feature M.
The decoder includes an upsampling module and a convolution layer with a convolution kernel size of 7 x 7, and uses Feature M as the upsampling method of the first layer by a conventional 3 x 3 transpose convolution, which is expressed as: f 1=TC(FeatureM); where TC represents a transpose convolution operation and F 1 represents a result after 3×3 transpose convolution. And using Carafe up-sampling operators in a second layer, wherein the Carafe module consists of three convolution layers and is divided into an up-sampling kernel prediction module and a characteristic recombination module.
The Carafe module processing process is specifically as follows:
first, the number of channels of the input feature map is compressed from C to C n using a1 x 1 convolutional layer. It is expressed as: f 2=C1(F1); where C 1 represents a channel compression operation, and F 2 represents a result obtained by channel compressing F 1.
Next, upsampling kernel prediction is performed with a 3×3 convolution kernel, where the input is H×W×C n and the output isThe parameter set σ=2, k up =3, σ represents the upsampling factor, and k up represents the reassembly kernel size. By this operation, the reception domain of the encoder can be increased, and the context information can be fully utilized in a larger area. Simultaneously, the channel dimension is unfolded in the space dimension to obtain the shape as/>And spatially normalizing each re-assembled kernel of size k up×kup. It is expressed as: /(I)Where KPM denotes performing an upsampling kernel prediction operation, unfolding denotes spatially expanding it, softmax denotes normalizing it, and F 3 denotes the result of upsampling kernel prediction.
In the feature recombination module, dot product operation is carried out on the upsampling prediction kernel and a k up×kup area taking a certain feature point of the input feature map as the center so as to realize feature recombination. Finally, the channel is compressed using a convolution layer with a convolution kernel of 1×1 and output Feature CM. It is expressed as: Where CARM represents a Feature reorganization operation, N (F 1i,kup) represents a region of size k up×kup centered on a Feature point of F 1i in the input Feature map, F 3i represents an upsampling kernel of the point predicted by the upsampling kernel prediction module, C 2 represents a channel compression operation, and Feature CM represents a result output from TC-Carafe.
Finally, feature CM is fed into a 7×7 convolution layer and Tanh activated to output the generated map image.
Step 3: inputting the training set and the verification set in the step 1 into the network model built in the step 2 for optimization training to obtain an optimal model;
Judging the map output in the step 2 by adopting a discriminator, wherein the discriminator adopts PatchGAN to disassemble the judging task of the whole image into judging tasks of a plurality of local areas (patches) in the image, and the method is beneficial to the network to understand the local structure and details of the image more carefully;
The loss function used in the training process is as follows:
1) Topology consistency loss: wherein the method comprises the steps of For the image gradient L1 loss, L grastr is the image gradient structure loss.
In the method, in the process of the invention,Representing sampling from a remote sensing image sample; c 1 and C 2 are constant terms; m and N represent M N scale of the input graph with N columns; /(I)Covariance of G j (y) and G j(GX→Y (x)); /(I)Standard deviation of G j (y); g j(GX→Y (x)). /(I) Covariance of G i (y) and G i(GX→Y (x)); /(I)Standard deviation of G i (y); /(I)G i(GX→Y (x)). For a real map and 255 x 255 gradient images of the generated map, their pixel matrices G (y) and G (G X→Y (x)) have 255 rows and 255 columns. For the j-th column (i-th row), the pixel value of the point on that column (i-th row) consists of an i-dimensional (j-th-dimensional) random variable G j(Gi. G (G X→Y (x)) -G (y) is a gradient image of the real map y, and G (G X→Y (x)) is a gradient image of the generated dummy map G X→Y (x).
2) Content loss: the aim is to ensure that the generated map is similar in content to Ground Truth, whereAndFor the cycle loss,/>And/>Is a direct loss. In the unsupervised phase, cyclic losses are employed. In the supervised phase, direct losses are employed.
In the method, in the process of the invention,Represents the cyclic loss of the remote sensing image, lambda is the fine tuning coefficient, L1u represents the L1 loss in the unsupervised stage,/>The pixel L1 loss is calculated by the loop loss, and the difference between the pixels is calculated by the L1 loss, so that the generated map and the remote sensing image keep the loop consistency in content. /(I)The method is characterized in that the method comprises the steps of sampling a remote sensing image sample, calculating a false map image generated by G X→Y (x) by G Y→X(GX→Y (x) -x, and generating cyclic losses of the false remote sensing image and the true remote sensing image x through G Y→X.
In the method, in the process of the invention,Representing the cyclic loss of the map,/>Representing sampling from a map sample. By introducing topology consistency loss/>To maintain cyclical consistency of the topology of the generated image with the topology of the target image. G X→Y(GY→X (y)) -y represents the cyclic loss of the false map image and the true map image y generated by G X→Y after calculating the false remote sensing image generated by G Y→X (y).
Direct loss of map to remote sensing imageContent consistency is maintained by the L1 penalty function. /(I)Wherein lambda is the trimming coefficient, L1 represents the L1 loss,/>Indicating pixel L1 loss. G Y→X (y) -x represents the loss between the computationally generated pseudo-remote sensing image and the true remote sensing image.
Direct loss of remote sensing image to mapBy/>The loss keeps the generated map consistent with the remote sensing image in topology. G X→Y (x) -y represents the loss between the computationally generated false map image and the true map image.
3) Countering losses: the difference between the generated image and the real image is discriminated by a discriminator. The purpose of generator G is to minimize the loss function value and the purpose of arbiter D is to maximize the value of the loss function. The formula is as follows:
Countering loss of remote sensing image to map G X→Y is a generator of remote sensing images to maps, and the generated images are input to a discriminator D Y for discrimination.
Map-to-remote sensing image countering lossG Y→X is a map-to-remote sensing image generator, and the generated image is sent to a discriminator D X for discrimination.
4) Identity loss: for ensuring consistency between the converted image and the original image. For example, the map generated by the map input generator G X→Y should be kept as consistent as possible with the input map, i.e., as much as the content and color of the map itself. The formula is as follows:
Step 4: and (3) inputting the remote sensing image into the optimal model obtained after training in the step (3), and outputting a corresponding map image.
Compared with the prior art, the invention has the remarkable advantages that: the long-range dependency relationship between the features is captured by combining a residual error module and a Multi-head self-attention mechanism Multi-HEADED SELF-attention, so that more accurate feature information is provided for up-sampling operation. Secondly, a novel up-sampling method combining the traditional transpose convolution and Carafe operators is provided, and up-sampling operation is carried out through up-sampling kernel prediction and feature recombination, so that the generated map is clearer and more complete.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of a map generation network module of the present invention;
FIG. 3 is a block diagram of a style converter according to the present invention;
FIG. 4 is a diagram of an upsampling module according to the present invention;
FIG. 5 is a block diagram of a arbiter according to the present invention;
fig. 6 is a diagram showing the comparison result of the visual effect of the map generating method based on the remote sensing image and other methods according to the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific implementation steps.
As shown in fig. 1, a map generating method based on remote sensing images specifically includes the following steps:
Step 1: in this embodiment, the disclosed new york city dataset is used to divide the dataset into a training set, a validation set, and a test set in a ratio of 8:1:1. The training set and the verification set are subjected to data enhancement processing, the zoom level is 16, 2194 remote sensing images and corresponding maps are contained in the training set and the verification set, and the image size is 256 multiplied by 256.
Step 2: constructing a map generation network model, as shown in fig. 2, wherein the model comprises an encoder, a style conversion module and a decoder;
the encoder comprises 1 convolution layer with the convolution kernel size of 7×7, a batch normalization layer (BN), a ReLU activation function and 2 downsampling layers (including a convolution layer with the convolution kernel size of 3×3, a batch normalization layer (BN), a ReLU activation function); a CxHxW remote sensing image is input, and the size of a feature map and redundant information are reduced through 1 convolution layer with the convolution kernel size of 7 x 7 and 2 downsampling layers with the convolution kernel size of 3 x 3 in an encoder.
As shown in fig. 3, the style conversion module uses a residual block in combination with a multi-head self-attention Mechanism (MHSA) as the primary structure of the style converter; the Feature map Feature R obtained after the encoder processing sequentially passes through 9 residual blocks composed of 3×3 convolution layers, a ReLU activation function, and 3×3 convolution layers to perform preliminary style conversion. At the last layer of the style converter, a 1×1 convolution layer, a ReLU activation function, a 4-head self-attention mechanism module, a ReLU activation function, a 1×1 convolution layer, and a ReLU activation function are sequentially used to output Feature M. Feature representations can be learned in parallel over multiple subspaces by introducing a multi-headed self-attention mechanism to capture long-range dependencies between pixels in an image, thereby increasing the non-linearity capability of the model.
The decoder includes an upsampling module and a convolution layer with a convolution kernel size of 7 x 7, as shown in fig. 4, using Feature M as the first layer upsampling method by a conventional 3 x 3 transpose convolution, which is expressed as:
F1=TC(FeatureM);
Where TC represents a transpose convolution operation and F 1 represents a result after 3×3 transpose convolution.
An upsampling operator is used Carafe at the second layer to better improve the detail and sharpness of the image generation. Carafe up-sampling has the advantages of large receptive field, light weight and high calculation speed. The Carafe module consists of three convolution layers and is divided into an up-sampling kernel prediction module and a characteristic recombination module. In the upsampling kernel prediction module, the input data is a feature map of size c×h×w. C represents the number of channels in an image, H represents the number of pixels in the vertical dimension of the image, and W represents the number of pixels in the horizontal dimension of the image; first, the number of channels of the input feature map is compressed from C to C n using a 1 x1 convolutional layer. It is expressed as: f 2=C1(F1);
Where C 1 represents a channel compression operation, and F 2 represents a result obtained by channel compressing F 1.
Next, upsampling kernel prediction is performed with a 3×3 convolution kernel, where the input is H×W×C n and the output isThe parameter set σ=2, k up =3, σ represents the upsampling factor, and k up represents the reassembly kernel size. By this operation, the reception domain of the encoder can be increased, and the context information can be fully utilized in a larger area. Simultaneously, the channel dimension is unfolded in the space dimension to obtain the shape as/>And spatially normalizing each re-assembled kernel of size k up×kup. It is expressed as: /(I)
Where KPM denotes performing an upsampling kernel prediction operation, unfolding denotes spatially expanding it, softmax denotes normalizing it, and F 3 denotes the result of upsampling kernel prediction.
In the feature recombination module, dot product operation is carried out on the upsampling prediction kernel and a k up×kup area taking a certain feature point of the input feature map as the center so as to realize feature recombination. Finally, the channel is compressed using a convolution layer with a convolution kernel of 1×1 and output Feature CM. It is expressed as:
Where CARM represents a Feature reorganization operation, N (F 1i,kup) represents a region of size k up×kup centered on a Feature point of F 1i in the input Feature map, F 3i represents an upsampling kernel of the point predicted by the upsampling kernel prediction module, C 2 represents a channel compression operation, and Feature CM represents a result output from TC-Carafe.
Finally, feature CM is fed into a 7×7 convolution layer and Tanh activated to output the generated map image.
Step 3: sending the training set and the verification set in the data set obtained in the step 1 into the network built in the step 2, training according to the adjusted parameters, and storing an optimal model;
Specifically, during training, batch Size was set to 1, the network was optimized using Adam optimizer, setting β 1=0.5,β2 =0.999. The learning rate strategy uses linear, the initial learning rate is 0.0002, the learning rate is kept unchanged from 1 to 100epoch, and the learning rate gradually decays to 0 from 101 to 200 epoch. All model training epochs were 200.
When the image generated by the generator is input to the discriminator PatchGAN as shown in fig. 5, the task of the discriminator is subdivided into tasks of judging a plurality of partial areas (patches) in the image. This method of decomposition helps the arbiter to more finely understand the local structure and details of the map image. Specifically, after the image passes through 5 convolutions of 4×4, leakyReLu and BN layers, the input is mapped to a 70×70 matrix, and the true and false discrimination is performed for each patch in the 70×70 matrix. The discriminator judges the authenticity of each local area one by one, if the patch is judged to be true, the discriminator is marked as 1, if the patch is judged to be false, the discriminator is marked as 0, and finally the probability of the whole image being true is calculated. The discrimination mode can promote the generator to better capture the local characteristics of the map, promote the sense of reality and detail expression of the generated map, and optimize model training by resisting loss.
The loss function used in the training process is as follows:
1) Topology consistency loss (Topological Consistency Loss): to ensure the correctness of the G X→Y (remote sensing to map) topology. Wherein the method comprises the steps of For the image gradient L1 loss, L grastr is the image gradient structure loss.
In the method, in the process of the invention,Representing sampling from a remote sensing image sample; c 1 and C 2 are constant terms; m and N represent M N scale of the input graph with N columns; /(I)Covariance of G j (y) and G j(GX→Y (x)); /(I)Standard deviation of G j (y); g j(GX→Y (x)). /(I) Covariance of G i (y) and G i(GX→Y (x)); /(I)Standard deviation of G i (y); /(I)G i(GX→Y (x)). For a real map and 255 x 255 gradient images of the generated map, their pixel matrices G (y) and G (G X→Y (x)) have 255 rows and 255 columns. For the j-th column (i-th row), the pixel value of the point on that column (i-th row) consists of an i-dimensional (j-th-dimensional) random variable G j(Gi. G (G X→Y (x)) -G (y) is a gradient image of the real map y, and G (G X→Y (x)) is a gradient image of the generated dummy map G X→Y (x).
2) Content Loss (Content Loss): the aim is to ensure that the generated map is similar in content to Ground Truth, whereAnd/>For the cycle loss,/>And/>Is a direct loss. In the unsupervised phase, cyclic losses are employed. In the supervised phase, direct losses are employed.
In the method, in the process of the invention,Represents the cyclic loss of the remote sensing image, lambda is the fine tuning coefficient, L1u represents the L1 loss in the unsupervised stage,/>The pixel L1 loss is calculated by the loop loss, and the difference between the pixels is calculated by the L1 loss, so that the generated map and the remote sensing image keep the loop consistency in content. /(I)The method is characterized in that the method comprises the steps of sampling a remote sensing image sample, calculating a false map image generated by G X→Y (x) by G Y→X(GX→Y (x) -x, and generating cyclic losses of the false remote sensing image and the true remote sensing image x through G Y→X.
In the method, in the process of the invention,Representing the cyclic loss of the map,/>Representing sampling from a map sample. By introducing topology consistency loss/>To maintain cyclical consistency of the topology of the generated image with the topology of the target image. G X→Y(GY→X (y)) -y represents the cyclic loss of the false map image and the true map image y generated by G X→Y after calculating the false remote sensing image generated by G Y→X (y).
Direct loss of map to remote sensing imageContent consistency is maintained by the L1 penalty function. /(I)Wherein lambda is the trimming coefficient, L1 represents the L1 loss,/>Indicating pixel L1 loss. G Y→X (y) -x represents the loss between the computationally generated pseudo-remote sensing image and the true remote sensing image.
Direct loss of remote sensing image to mapBy/>The loss keeps the generated map consistent with the remote sensing image in topology. G X→Y (x) -y represents the loss between the computationally generated false map image and the true map image.
3) Countermeasures against loss (ADVERSARIAL LOSS): the difference between the generated image and the real image is discriminated by a discriminator. The purpose of generator G is to minimize the loss function value and the purpose of arbiter D is to maximize the value of the loss function. The formula is as follows:
Countering loss of remote sensing image to map G X→Y is a generator of remote sensing images to maps, and the generated images are input to a discriminator D Y for discrimination.
Map-to-remote sensing image countering lossG Y→X is a map-to-remote sensing image generator, and the generated image is sent to a discriminator D X for discrimination.
4) Identity Loss (Identity Loss): for ensuring consistency between the converted image and the original image. For example, the map generated by the map input generator G X→Y should be kept as consistent as possible with the input map, i.e., as much as the content and color of the map itself. The formula is as follows:
step 4: performing remote sensing image generation map test on the optimal model obtained in the step 3 to obtain a map image;
As shown in fig. 6, which shows the generation result of the map generation model, it can be seen that CycleGAN and Pix2Pix have error generation of content, and SmapGAN has the problems of fuzzy generated content and unclear edges. The invention captures the long-range dependency relationship between the features by adopting MHSA-ResBlock, namely combining a residual module with Multi-HEADED SELF-attention, in the style converter part, so as to provide more accurate feature information for the up-sampling operation. Secondly, a novel upsampling method TC-Carafe combining traditional transpose convolution and Carafe operators is used, and upsampling operation is carried out through upsampling kernel prediction and feature recombination, so that the generated map has good visual improvement effects on the aspects of roads, buildings, edge details, map content color saturation and the like. The index comparison results of the map generation method and other methods are shown in table 1, and it can be seen that the method has more obvious advantages, and each evaluation index (peak signal to noise ratio (PSNR), structural Similarity (SSIM), root Mean Square Error (RMSE)) is superior to the other methods.
TABLE 1
Model PSNR SSIM RMSE
Pix2pix 19.9255 0.6719 28.9183
CycleGAN 24.5271 0.8157 18.3162
SmapGAN 27.5014 0.8742 12.4684
Ours 28.1147 0.8784 11.7484
While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (4)

1. A method for generating a map based on style migration and remote sensing images is characterized by comprising the following steps:
The method specifically comprises the following steps:
Step 1: firstly, dividing an arbitrary public remote sensing image data set into a training set, a verification set and a test set according to a set proportion; the data set comprises a plurality of paired images, and each paired image comprises a remote sensing image and a corresponding map image; meanwhile, data enhancement processing is carried out on the training set and the verification set, and the data enhancement processing is not carried out on the test set;
Step 2: constructing a map generation network model, wherein the model comprises an encoder, a style conversion module and a decoder;
The encoder comprises 1 convolution layer with the convolution kernel size of 7×7, a batch normalization layer, a ReLU activation function and 2 downsampling layers, wherein the downsampling layers comprise the convolution layer with the convolution kernel size of 3×3, the batch normalization layer and the ReLU activation function;
The style conversion module uses a residual block and a multi-head self-attention mechanism to be combined as a main structure of the style converter; the Feature map Feature R obtained after the encoder processing sequentially passes through 9 residual blocks consisting of a 3×3 convolution layer, a ReLU activation function and a 3×3 convolution layer to perform preliminary style conversion; in the last layer of the style converter, a1×1 convolution layer, a ReLU activation function, a 4-head self-attention mechanism module, a ReLU activation function, the 1×1 convolution layer and the ReLU activation function are sequentially used to output features M;
the decoder includes an upsampling module and a convolution layer with a convolution kernel size of 7 x 7, and uses Feature M as the upsampling method of the first layer by a conventional 3 x 3 transpose convolution, which is expressed as: f 1=TC(FeatureM); where TC represents a transpose convolution operation and F 1 represents a result after 3×3 transpose convolution; using Carafe up-sampling operators on a second layer, wherein a Carafe module consists of three convolution layers and is divided into an up-sampling kernel prediction module and a characteristic recombination module;
step 3: inputting the training set and the verification set in the step 1 into the network model built in the step 2 for optimization training to obtain a final optimization model;
step 4: and (3) inputting the remote sensing image into the optimal model obtained by training in the step (3), and outputting a corresponding map image.
2. The method of generating a map from a remote sensing image of claim 1, wherein: the data enhancement described in step 1 includes flipping and rotating operations.
3. The method of generating a map from a remote sensing image of claim 1, wherein: the Carafe module processing procedure in step 2 is specifically as follows:
First, the number of channels of the input feature map is compressed from C to C n using a1×1 convolutional layer; it is expressed as: f 2=C1(F1); wherein C 1 represents a channel compression operation, F 2 represents a result obtained by channel compressing F 1;
Next, upsampling kernel prediction is performed with a 3×3 convolution kernel, where the input is H×W×C n and the output is Parameter set σ=2, k up =3, σ represents the upsampling factor, k up represents the reassembly kernel size; by this operation, the reception domain of the encoder can be increased, and the context information can be fully utilized in a larger area; simultaneously, the channel dimension is unfolded in the space dimension to obtain the shape as/>And performing spatial normalization processing on each recombination core with the size of k up×kup; it is expressed as: /(I)Wherein KPM represents performing an upsampling kernel prediction operation, unfolding represents performing spatial expansion on the KPM, softmax represents performing normalization on the KPM, and F 3 represents the result of upsampling kernel prediction;
In the feature recombination module, carrying out dot product operation on an up-sampling prediction kernel and a k up×kup area taking a certain feature point of an input feature map as a center so as to realize feature recombination; finally, the channel is compressed by using a convolution layer with a convolution kernel of 1×1 and the Feature CM is output; it is expressed as: Wherein CARM represents Feature recombination operation, N (F 1i,kup) represents a region with the size of k up×kup taking a Feature point of F 1i as a center in an input Feature map, F 3i represents an upsampling core of the point predicted by an upsampling core prediction module, C 2 represents channel compression operation, and Feature CM represents a result output by TC-Carafe;
Finally, feature CM is fed into a 7×7 convolution layer and Tanh activated to output the generated map image.
4. The method of generating a map from a remote sensing image of claim 1, wherein: and 3, optimally training the model constructed in the step 2 by adopting a loss function and a discriminator:
The discriminator adopts PatchGAN, is formed by multi-layer convolution, and disassembles the judging task of the whole image into judging tasks of a plurality of local areas in the image;
The loss function used in the training process is as follows:
1) Topology consistency loss: wherein the method comprises the steps of For the image gradient L1 loss, L grastr is the image gradient structure loss;
In the method, in the process of the invention, Representing sampling from a remote sensing image sample; c 1 and C 2 are constant terms; m and N represent M N scale of the input graph with N columns; /(I)Covariance of G j (y) and G j(GX→Y (x)); /(I)Standard deviation of G j (y); /(I)G j(GX→Y (x)). /(I)Covariance of G i (y) and G i(GX→Y (x)); /(I)Standard deviation of G i (y); /(I)G i(GX→Y (x)). For a 255×255 gradient image of a real map and a generated map, their pixel matrices G (y) and G (G X→Y (x)) have 255 rows and 255 columns; for the j-th column (i-th row), the pixel value of the point on that column (i-th row) consists of an i-dimensional (j-th-dimensional) random variable G j(Gi; g (G X→Y (x)) -G (y) being a gradient image of the real map y, G (G X→Y (x)) being a gradient image of the generated dummy map G X→Y (x);
2) Content loss: the aim is to ensure that the generated map is similar in content to Ground Truth, where And/>For the cycle loss,/>And/>Is a direct loss; in the unsupervised stage, cyclic losses are adopted; in the supervised stage, direct loss is adopted;
In the method, in the process of the invention, Represents the cyclic loss of the remote sensing image, lambda is the fine tuning coefficient, L1u represents the L1 loss in the unsupervised stage,/>Calculating the difference between pixels through the L1 loss for the pixel L1 loss, so that the generated map and the remote sensing image keep the cyclical consistency in content; /(I)Representing sampling from a remote sensing image sample, calculating a false map image generated by G X→Y (x) by G Y→X(GX→Y (x)) -x, and generating cyclic losses of the false remote sensing image and the true remote sensing image x by G Y→X;
In the method, in the process of the invention, Representing the cyclic loss of the map,/>Representing sampling from a map sample; by introducing topology consistency loss/>To maintain cyclical consistency of the topology of the generated image with the topology of the target image; g X→Y(GY→X (y)) -y represents the cyclic loss of the fake remote sensing image generated by G Y→X (y) and then generating the fake map image and the real map image y through G X→Y;
Direct loss of map to remote sensing image Maintaining content consistency by an L1 loss function; /(I)Wherein lambda is the trimming coefficient, L1 represents the L1 loss,/>Representing pixel L1 loss; g Y→X (y) -x represents the loss between the calculated false remote sensing image and the true remote sensing image;
Direct loss of remote sensing image to map By/>The loss keeps the generated map consistent with the remote sensing image in topological structure; g X→Y (x) -y represents the loss between the computationally generated false map image and the true map image;
3) Countering losses: judging the difference between the generated image and the real image by a judging device; the purpose of generator G is to minimize the loss function value, and the purpose of arbiter D is to maximize the value of the loss function; the formula is as follows:
Countering loss of remote sensing image to map G X→Y is a generator of remote sensing images to a map, and the generated images are input to a discriminator D Y for discrimination;
map-to-remote sensing image countering loss G Y→X is a generator of map to remote sensing images, and the generated images are sent to a discriminator D X for discrimination;
4) Identity loss: for ensuring consistency between the converted image and the original image; for example, the map generated by the map input generator G X→Y should be kept as consistent as possible with the input map, i.e. as much as the content and color of the map itself; the formula is as follows:
CN202410299406.8A 2024-03-15 2024-03-15 Method for generating map based on style migration and remote sensing image Pending CN118096922A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410299406.8A CN118096922A (en) 2024-03-15 2024-03-15 Method for generating map based on style migration and remote sensing image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410299406.8A CN118096922A (en) 2024-03-15 2024-03-15 Method for generating map based on style migration and remote sensing image

Publications (1)

Publication Number Publication Date
CN118096922A true CN118096922A (en) 2024-05-28

Family

ID=91147648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410299406.8A Pending CN118096922A (en) 2024-03-15 2024-03-15 Method for generating map based on style migration and remote sensing image

Country Status (1)

Country Link
CN (1) CN118096922A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118521498A (en) * 2024-07-23 2024-08-20 南昌航空大学 Industrial defect image generation method, device, medium and product

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118521498A (en) * 2024-07-23 2024-08-20 南昌航空大学 Industrial defect image generation method, device, medium and product

Similar Documents

Publication Publication Date Title
CN111476219B (en) Image target detection method in intelligent home environment
CN113449594B (en) Multilayer network combined remote sensing image ground semantic segmentation and area calculation method
CN111738111A (en) Road extraction method of high-resolution remote sensing image based on multi-branch cascade void space pyramid
CN114187450B (en) Remote sensing image semantic segmentation method based on deep learning
CN110533631A (en) SAR image change detection based on the twin network of pyramid pondization
CN110555841B (en) SAR image change detection method based on self-attention image fusion and DEC
CN106295613A (en) A kind of unmanned plane target localization method and system
CN112489164B (en) Image coloring method based on improved depth separable convolutional neural network
CN118096922A (en) Method for generating map based on style migration and remote sensing image
CN110599502B (en) Skin lesion segmentation method based on deep learning
CN117788957B (en) Deep learning-based qualification image classification method and system
CN114913379B (en) Remote sensing image small sample scene classification method based on multitasking dynamic contrast learning
CN116524419B (en) Video prediction method and system based on space-time decoupling and self-attention difference LSTM
CN115311502A (en) Remote sensing image small sample scene classification method based on multi-scale double-flow architecture
CN113554653A (en) Semantic segmentation method for long-tail distribution of point cloud data based on mutual information calibration
CN117593666B (en) Geomagnetic station data prediction method and system for aurora image
CN115170403A (en) Font repairing method and system based on deep meta learning and generation countermeasure network
CN114764880B (en) Multi-component GAN reconstructed remote sensing image scene classification method
CN116993639A (en) Visible light and infrared image fusion method based on structural re-parameterization
CN115953330A (en) Texture optimization method, device, equipment and storage medium for virtual scene image
CN116797681A (en) Text-to-image generation method and system for progressive multi-granularity semantic information fusion
CN110866866A (en) Image color-matching processing method and device, electronic device and storage medium
CN114792349B (en) Remote sensing image conversion map migration method based on semi-supervised generation countermeasure network
Li et al. Semantic prior-driven fused contextual transformation network for image inpainting
CN116258923A (en) Image recognition model training method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination