CN114283285A - Cross consistency self-training remote sensing image semantic segmentation network training method and device - Google Patents

Cross consistency self-training remote sensing image semantic segmentation network training method and device Download PDF

Info

Publication number
CN114283285A
CN114283285A CN202111364685.4A CN202111364685A CN114283285A CN 114283285 A CN114283285 A CN 114283285A CN 202111364685 A CN202111364685 A CN 202111364685A CN 114283285 A CN114283285 A CN 114283285A
Authority
CN
China
Prior art keywords
remote sensing
sensing image
training
semantic segmentation
segmentation network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111364685.4A
Other languages
Chinese (zh)
Inventor
吕亮
齐革军
牛晨晖
陈晓路
杭兆峰
刘溟江
姚中原
严祺慧
王恩民
任鑫
王�华
童彤
赵鹏程
杜静宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaneng Power International Jiangsu Energy Development Co Ltd
Huaneng Yancheng Dafeng New Energy Power Generation Co ltd
Huaneng Clean Energy Research Institute
Clean Energy Branch of Huaneng International Power Jiangsu Energy Development Co Ltd Clean Energy Branch
Original Assignee
Huaneng Power International Jiangsu Energy Development Co Ltd
Huaneng Yancheng Dafeng New Energy Power Generation Co ltd
Huaneng Clean Energy Research Institute
Clean Energy Branch of Huaneng International Power Jiangsu Energy Development Co Ltd Clean Energy Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaneng Power International Jiangsu Energy Development Co Ltd, Huaneng Yancheng Dafeng New Energy Power Generation Co ltd, Huaneng Clean Energy Research Institute, Clean Energy Branch of Huaneng International Power Jiangsu Energy Development Co Ltd Clean Energy Branch filed Critical Huaneng Power International Jiangsu Energy Development Co Ltd
Priority to CN202111364685.4A priority Critical patent/CN114283285A/en
Publication of CN114283285A publication Critical patent/CN114283285A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The application discloses a cross consistency self-training remote sensing image semantic segmentation network training method and device, and relates to the technical field of remote sensing image segmentation. The specific implementation scheme is as follows: constructing a remote sensing image semantic segmentation network which is UNet; training the semantic segmentation network of the remote sensing image according to the sample data set of the remote sensing image; and inputting the remote sensing image shot in real time into the trained remote sensing image semantic segmentation network, and outputting a semantic segmentation result of the remote sensing image. According to the embodiment of the application, the remote sensing image is segmented by constructing the remote sensing image semantic segmentation network and training the remote sensing image semantic segmentation network according to the remote sensing image sample data set. According to the embodiment of the application, the situation that the segmentation and labeling of the remote sensing image consume a large amount of manpower can be avoided, the training efficiency is improved, and the accuracy of the segmentation of the remote sensing image is improved.

Description

Cross consistency self-training remote sensing image semantic segmentation network training method and device
Technical Field
The application relates to the technical field of remote sensing image segmentation, in particular to a cross consistency self-training remote sensing image semantic segmentation network training method and device.
Background
Image segmentation refers to a technique and a process for dividing an image into regions with characteristics and extracting an interested target, and a remote sensing image is a common and typical color image and is widely concerned. The utilization of image segmentation technology to mark a pixel in a remote sensing image as a type of ground object, such as a building, a water body, a road, a farmland, a vehicle and the like, has been a research hotspot of broad students. The traditional image segmentation method (such as threshold value method, k-Means clustering method, region method and edge detection method) only concerns to find the boundary contour of the ground object and does not concern to the category of the ground object. In recent years, due to the rapid development of deep learning and the great improvement of computer storage and calculation power, the semantic segmentation method based on the deep convolutional neural network becomes a new tool for segmenting the high-resolution remote sensing image.
However, semantic segmentation based on the deep convolutional neural network can be regarded as a pixel-level classification task, ground object targets in the image need to be densely labeled, the labeling difficulty is higher, and for a high-resolution remote sensing image, a large amount of labor and time are needed for labeling a semantic segmentation data set, which brings great difficulty to the semantic segmentation of the remote sensing image based on the deep convolutional neural network
Disclosure of Invention
The application provides a cross consistency self-training remote sensing image semantic segmentation network training method and device. The technical scheme of the application is as follows:
according to a first aspect of an embodiment of the application, a cross consistency self-training remote sensing image semantic segmentation network training method is provided, and comprises the following steps:
constructing a remote sensing image semantic segmentation network which is UNet;
training the semantic segmentation network of the remote sensing image according to the sample data set of the remote sensing image;
and inputting the remote sensing image shot in real time into the trained remote sensing image semantic segmentation network, and outputting a semantic segmentation result of the remote sensing image.
Optionally, the training the semantic segmentation network of the remote sensing image according to the remote sensing image data set includes:
acquiring a sample data set of the remote sensing image, wherein the sample data set of the remote sensing image comprises a remote sensing image;
marking the ground object categories of pixels in the remote sensing image to generate a corresponding label remote sensing image;
preprocessing the remote sensing image sample data set to obtain a training set, a verification set and a test set by division;
constructing a remote sensing image semantic segmentation network comprising an encoder, a main decoder and a plurality of auxiliary decoders;
and dividing the training set into a marked sample and a non-marked sample, and inputting the marked sample and the non-marked sample into the remote sensing image semantic segmentation network for training.
Optionally, the inputting into the remote sensing image semantic segmentation network for training includes:
determining a hyper-parameter and a loss function used for training the semantic segmentation network of the remote sensing image;
optimizing parameters of a semantic segmentation network of the remote sensing image until the prediction precision of the verification set prediction result reaches a preset precision threshold;
and inputting the test set into a trained remote sensing image semantic segmentation network to verify the accuracy of network segmentation.
Optionally, the training of the remote sensing semantic segmentation network includes:
dividing the training set remote sensing image into a marked sample and an unmarked sample, wherein the marked sample comprises the remote sensing image and a corresponding label remote sensing image, and the unmarked sample only comprises the remote sensing image;
in the first training stage, a marked sample and an unmarked sample are input into an encoder of the remote sensing semantic segmentation network, a marked sample characteristic diagram extracted by the encoder is input into a main decoder, and the obtained prediction result and the label data calculate the supervision loss;
in the first part of the second training stage, random transformation is carried out on the unlabelled sample feature map extracted by the encoder, the unlabelled sample feature map is input into a main decoder and an auxiliary decoder, and consistency loss is calculated according to the prediction result of the auxiliary decoder and the prediction result of the main decoder;
and in the second part of the second training stage, the prediction results of the main decoder and the auxiliary decoder are fused to obtain a pseudo mark, and the pseudo mark and the corresponding remote sensing image are used as a mark sample for supervised training.
Optionally, the preprocessing the remote sensing image sample data set includes:
randomly sampling the remote sensing image in the remote sensing image sample data set into a small image to perform multiple rounds of batch training, wherein the sampling frequency of each round of training is 480;
setting the sampling size and the training batch value of the remote sensing image according to the size of a video memory space, wherein the default size of an input image is 512 multiplied by 512, and the default size of the training batch is 10;
carrying out multi-round random sampling on the remote sensing image and the label remote sensing image, wherein each sampling obtains a labeled remote sensing image and a corresponding label image with the size of 512 multiplied by 512, and an unlabeled image;
carrying out data enhancement on samples sampled in each round at random times and random degrees, and setting a training sample enhancement parameter range, wherein the data enhancement comprises at least one of the following items: randomly rotate n × 90 ° (n ═ 0,1,2, 3); randomly overturning for 180 degrees in the horizontal direction or the vertical direction; random scaling, wherein the multiple value range of the size scaling is [0.5,2 ]; random brightness enhancement, wherein the multiple value range of the brightness enhancement is [0.5,2 ]; random contrast enhancement, wherein the value range of the multiple of the contrast enhancement is [0.5,2 ]; and enhancing the random saturation, wherein the multiple value range of the saturation enhancement is [0.5,2 ].
Optionally, the remote sensing image semantic segmentation network is a classical semantic segmentation network UNet, where the UNet includes an encoder and a main decoder, and the number of auxiliary decoders is 3.
Optionally, the change strategy of the learning rate of the hyper-parameter and loss function in the training process includes:
Figure BDA0003360449140000031
wherein base _ lr is 4.2 e-6; the hot start factor, warp _ factor, is 1.2; epoch is the number of training iterations; lr is the learning rate, and lr gradually rises along with the increase of time; the warp _ epoch is the iteration number of hot start, and is set to be 30;
performing hot start, and after training exceeds the arm _ epoch, using a polynomial learning rate attenuation strategy, setting the maximum training iteration number maxlr to be 1500, setting the attenuation index power to be 0.9, and setting the maximum learning rate maxlr to be 1 e-3;
the loss function used for calculating the supervision loss is a cross entropy loss function, the loss function used for calculating the consistency loss is a mean square error loss function, and the whole loss function of the remote sensing image semantic segmentation network
Figure BDA0003360449140000032
Comprises the following steps:
Figure BDA0003360449140000033
Figure BDA0003360449140000034
is a prediction result; y isiIs a marked image;
Figure BDA0003360449140000035
to supervise losses; e.g. of the typemIs the prediction result of the primary decoder; e.g. of the typeaIs the prediction result of the auxiliary decoder;
Figure BDA0003360449140000036
loss of consistency; λ (epoch) is
Figure BDA0003360449140000037
The weight of (c).
In the first training stage, the overall loss is dominated by the supervision loss part;
in the second training phase, the overall loss is dominated by the consistency loss, and the specific formula of λ (epoch) is:
Figure BDA0003360449140000038
wherein num _ epochs is the total number of training iterations; setting a training round threshold a to be 200, gradually increasing lambda (epoch) before the training iteration is performed for the a time, and stabilizing lambda (epoch) to w after the training exceeds the a time; num _ epochs is set to 1500 and w is set to 1.
Optionally, the randomly transforming the unlabeled sample feature map includes:
dropout is 0.5, the random horizontal direction and the vertical direction are turned over by 180 degrees, and Gaussian noise N-U (-0.2,0.2) which is subjected to uniform distribution is added.
Optionally, the fusing the prediction results of the primary decoder and the secondary decoder includes:
and voting the prediction result of the main decoder and the prediction result of the auxiliary decoder according to positions, and taking the category with the most votes of each pixel as a final fusion result to generate the pseudo mark.
According to a second aspect of the embodiments of the present application, there is provided a cross consistency self-training remote sensing image semantic segmentation network training device, including:
the system comprises a construction module, a semantic segmentation module and a semantic segmentation module, wherein the construction module is used for constructing a remote sensing image semantic segmentation network which is UNet;
the training module is used for training the remote sensing image semantic segmentation network according to the remote sensing image sample data set;
and the segmentation module is used for inputting the remote sensing image shot in real time into the trained remote sensing image semantic segmentation network and outputting the semantic segmentation result of the remote sensing image.
According to a third aspect of the embodiments of the present application, there is provided a cross consistency self-training remote sensing image semantic segmentation network training device, including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method of any of the first aspects above.
According to a fourth aspect of embodiments of the present application, there is provided a non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of a cross-consistency self-training remote sensing image semantic segmentation network training device, enable the cross-consistency self-training remote sensing image semantic segmentation network training device to perform the method according to any one of the first aspect.
The technical scheme provided by the embodiment of the application at least has the following beneficial effects:
in the second training stage, a small amount of labeled data and a large amount of unlabeled data are utilized, consistency loss is solved for the output of the main decoder and the output of the auxiliary decoder by enhancing the data, overfitting of the model is effectively prevented, and the generalization capability of the model is improved. And output results of all decoders are fused to obtain pseudo-labeled images, and the pseudo-labeled images and corresponding remote sensing images are used as labeled samples to conduct supervision training, so that the performance of the model can be further improved. Therefore, under the condition of insufficient marking data, the method and the device can train a model with better performance by using a large amount of unmarked data, reduce the requirement on marking samples and reduce the labor cost of data marking.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the application and are not to be construed as limiting the application.
FIG. 1 is a flow diagram illustrating a method for training a cross-consistency self-trained remote sensing image semantic segmentation network according to an exemplary embodiment.
FIG. 2 is a flow diagram illustrating a method for training a cross-consistency self-trained remote sensing image semantic segmentation network according to an exemplary embodiment.
FIG. 3 is a flow diagram illustrating a method for training a cross-consistency self-trained remote sensing image semantic segmentation network according to an exemplary embodiment.
FIG. 4 is a block diagram illustrating a cross-consistency self-trained remote sensing image semantic segmentation network training apparatus according to an exemplary embodiment.
FIG. 5 is a block diagram illustrating a cross-consistency learning remote sensing image semantic segmentation model training system according to an exemplary embodiment.
FIG. 6 is a block diagram illustrating a cross-consistency learning remote sensing image semantic segmentation model training system according to an exemplary embodiment.
FIG. 7 is a block diagram illustrating an apparatus in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present application better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The purpose of image segmentation is to label each pixel as a category, and for remote sensing images, as a type of terrain, such as buildings, bodies of water, roads, farmland, vehicles, and so on. The image semantic segmentation is developed from the traditional image segmentation method, the traditional image segmentation method (threshold value method, k-Means clustering method, region method and edge detection method) only cares about the boundary contour of the ground feature and does not care about the category of the ground feature, and the semantic segmentation not only needs to accurately find the contour of the ground feature, but also needs to accurately judge the category of the ground feature, namely, gives the semantic meaning. Due to the rapid development of deep learning and the great improvement of computer storage and operational power, the semantic segmentation method based on the deep convolutional neural network becomes a new edge tool for segmenting the high-resolution remote sensing image.
The semantic segmentation based on the deep convolutional neural network can be regarded as a pixel-level classification task, ground object targets in the image need to be densely labeled, the labeling difficulty is higher, and for a high-resolution remote sensing image, a large amount of labor and time are needed for labeling a semantic segmentation data set, so that great difficulty is brought to the semantic segmentation of the remote sensing image based on the deep convolutional neural network.
There are two solutions in the related art: 1. the method based on self-training is mainly divided into three steps: in the first step, a model is trained on labeled data. And secondly, generating a pseudo label for the label-free data set by using the pre-trained model. And thirdly, retraining a model by using the true labels with the labeled data sets and the false labels without the labeled data sets. Fourthly, repeating the above processes for several times. This approach works well, but repeated training increases the time cost.
2. The core of the consistency learning-based method is to encourage the model to have similar output on the same sample after different transformations, the transformations comprise random rotation, turnover, color change and the like, the whole process is performed simultaneously with supervised training, the effect is better, but a space for improvement still exists.
FIG. 1 is a flow diagram illustrating a method for training a cross-consistency self-trained remote sensing image semantic segmentation network according to an exemplary embodiment. As shown in FIG. 1, the method is used for semantic segmentation of the remote sensing image and comprises the following steps.
Step 101, constructing a remote sensing image semantic segmentation network, wherein the remote sensing semantic segmentation network is UNet;
in the embodiment of the application, a remote sensing image semantic segmentation network is constructed, the remote sensing image semantic segmentation network is UNet, the UNet is composed of an encoder and a decoder, the encoder of the first half part is used for feature extraction, and the decoder of the second half part is used for up-sampling.
Step 102, training the remote sensing image semantic segmentation network according to a remote sensing image sample data set;
in the embodiment of the application, after the remote sensing image semantic segmentation network is constructed, the remote sensing image semantic segmentation network is trained according to the collected remote sensing image data set. The training process is divided into a first training stage and a second training stage, and in the first training stage, the overall loss is dominated by the supervision loss part; in the second training phase, the overall loss is dominated by the consistency loss.
And 103, inputting the remote sensing image shot in real time into the trained remote sensing image semantic segmentation network, and outputting a semantic segmentation result of the remote sensing image.
In the embodiment of the application, after the training of the remote sensing image semantic segmentation network is finished, the remote sensing image shot in real time can be input into the remote sensing image semantic segmentation network so as to obtain the semantic segmentation result of the remote sensing image.
According to the embodiment of the application, a small amount of labeled data and a large amount of label-free data are utilized in the second training stage, consistency loss is solved for the output of the main decoder and the output of the auxiliary decoder through data enhancement, model overfitting is effectively prevented, and the generalization capability of the model is improved. And output results of all decoders are fused to obtain pseudo-labeled images, and the pseudo-labeled images and corresponding remote sensing images are used as labeled samples to conduct supervision training, so that the performance of the model can be further improved. Therefore, under the condition of insufficient marking data, the method and the device can train a model with better performance by using a large amount of unmarked data, reduce the requirement on marking samples and reduce the labor cost of data marking.
FIG. 2 is a flow diagram illustrating a method for training a cross-consistency self-trained remote sensing image semantic segmentation network according to an exemplary embodiment. As shown in FIG. 2, the method is used for semantic segmentation of the remote sensing image and comprises the following steps.
Step 201, obtaining a sample data set of the remote sensing image, wherein the sample data set of the remote sensing image comprises a remote sensing image;
in the embodiment of the application, a remote sensing image is required to be collected as sample data to train the semantic segmentation model of the remote sensing image, and historical remote sensing data is obtained from a database to form a sample data set of the remote sensing image.
202, marking the ground object types of pixels in the remote sensing image to generate a corresponding label remote sensing image;
in the embodiment of the present application, the task of the semantic segmentation model of the remote sensing image is to identify and mark a surface feature class to which a pixel in the remote sensing image belongs, and in one possible embodiment, the surface feature class includes: buildings, water bodies, roads, arables, vehicles.
Step 203, preprocessing the sample data set of the remote sensing image to obtain a training set, a verification set and a test set by division;
in the embodiment of the application, in order to improve the training efficiency and reduce the workload of manually marking the ground object types, the remote sensing image sample data is preprocessed intensively by the remote sensing image. Sampling the remote sensing image into the same size, marking the ground feature type of the pixel, performing random data enhancement on the sampled image, and the like. And dividing the remote sensing image sample data set into a training set, a verification set and a test set. The training set is used for training the semantic segmentation model of the remote sensing image; the verification set is used for verifying whether the trained semantic segmentation model of the remote sensing image can accurately segment the remote sensing image or not; the test set is used for testing the segmentation accuracy of the trained semantic segmentation model of the remote sensing image.
Step 204, constructing a remote sensing image semantic segmentation network comprising an encoder, a main decoder and a plurality of auxiliary decoders;
in the embodiment of the application, the teacher network with the structure of student network is the same, all includes encoder and decoder, the encoder is used for going up the sampling, extracts the high dimensional feature of remote sensing image in order to generate the feature tensor, the decoder is used for down-sampling, will the feature tensor is dimension reduction to the result is cut apart in the generation, it does to cut apart the result the surface feature classification of each pixel in the remote sensing image with the probability that the surface feature classification corresponds.
And 205, dividing the training set into a marked sample and a non-marked sample, and inputting the marked sample and the non-marked sample into the remote sensing image semantic segmentation network for training.
Inputting a marked sample and a unmarked sample into an encoder and a main decoder in the remote sensing image semantic segmentation network, calculating supervision loss and training; and inputting the unmarked sample into an encoder, a main decoder and an auxiliary decoder in the remote sensing image semantic segmentation network, and generating a pseudo mark according to the output of the auxiliary decoder and the prediction result of the main decoder. And performing supervision training by taking the pseudo mark and the corresponding remote sensing image as a mark sample.
FIG. 3 is a flow diagram illustrating a method for training a cross-consistency self-trained remote sensing image semantic segmentation network according to an exemplary embodiment. As shown in FIG. 3, the method is used for semantic segmentation of the remote sensing image and comprises the following steps.
Step 301, determining a hyper-parameter and a loss function used for training the semantic segmentation network of the remote sensing image;
in the embodiment of the application, the parameter of the semantic segmentation model of the remote sensing image is set before the training process, and the hyper-parameter needs to be continuously optimized in the training process so as to improve the performance and effect of segmenting the remote sensing image. And the loss function is used for calculating the difference between the segmentation result and the marked sample, and the loss is calculated according to the loss function so as to measure the accuracy of the segmentation result of the remote sensing image.
Step 302, optimizing parameters of a semantic segmentation network of the remote sensing image until the prediction precision of the verification set prediction result reaches a preset precision threshold;
in the embodiment of the application, the remote sensing image semantic segmentation network needs to be optimized to a certain precision, the precision of the remote sensing image semantic segmentation network segmentation is verified according to the verification set, the semantic segmentation result is compared with the pixel proportion matched with the label in the label remote sensing image, the proportion is larger than or equal to the preset precision threshold value, the training is sufficient, and the training can be stopped.
And 303, inputting the test set into the trained remote sensing image semantic segmentation network to verify the network segmentation accuracy.
In the embodiment of the application, the trained semantic segmentation network of the remote sensing image is tested according to the test set, and the accuracy of the trained semantic segmentation network of the remote sensing image is calculated according to the comparison between the segmentation result of the test set and the label of the test set, whether the ground feature classes of the same pixel are the same and the proportion of the pixels occupying all the pixels.
Optionally, the training of the remote sensing semantic segmentation network includes:
dividing the training set remote sensing image into a marked sample and an unmarked sample, wherein the marked sample comprises the remote sensing image and a corresponding label remote sensing image, and the unmarked sample only comprises the remote sensing image;
in the first training stage, a marked sample and an unmarked sample are input into an encoder of the remote sensing semantic segmentation network, a marked sample characteristic diagram extracted by the encoder is input into a main decoder, and the obtained prediction result and the label data calculate the supervision loss;
in the embodiment of the application, the supervision loss is calculated by a cross entropy loss function.
And in the first part of the second training stage, the unlabeled sample feature map extracted by the encoder is subjected to random transformation and is input into a main decoder and an auxiliary decoder, and consistency loss is calculated according to the prediction result of the auxiliary decoder and the prediction result of the main decoder.
In the embodiment of the present application, the random transformation includes dropout of 0.5, and the random horizontal direction is 180 ° inverted from the vertical direction, and gaussian noise N to U (-0.2,0.2) subject to uniform distribution is added. The number of the auxiliary decoders is 3, the prediction result of each auxiliary decoder is compared with the prediction result of the main decoder to calculate consistency loss, and the consistency loss is calculated by a mean square error loss function. The maximum value of the coefficient of the loss of consistency is set to 1. The consistency loss reflects the difference of the outputs of the auxiliary decoder and the main decoder, so that the consistency loss is reduced as a target to train the remote sensing semantic segmentation network, namely, the prediction result of the auxiliary decoder is encouraged to be consistent with the prediction result of the main decoder as much as possible.
And in the second part of the second training stage, the prediction results of the main decoder and the auxiliary decoder are fused to obtain a pseudo mark, and the pseudo mark and the corresponding remote sensing image are used as a mark sample for supervised training.
In the embodiment of the present application, the prediction results of the main decoder and the auxiliary decoder are fused, and the feature class with the largest pixel vote number is selected by using a voting method, so as to generate the pseudo tag. The pseudo mark can be used as a mark corresponding to the remote sensing image, and is input into the encoder and the main decoder together with the remote sensing image for training. Thus, the number of training samples is increased, and the training effect is improved.
Optionally, the preprocessing the remote sensing image sample data set includes:
randomly sampling the remote sensing image in the remote sensing image sample data set into a small image to perform multiple rounds of batch training, wherein the sampling frequency of each round of training is 480;
setting the sampling size and the training batch value of the remote sensing image according to the size of a video memory space, wherein the default size of an input image is 512 multiplied by 512, and the default size of the training batch is 10;
carrying out multi-round random sampling on the remote sensing image and the label remote sensing image, wherein each sampling obtains a labeled remote sensing image and a corresponding label image with the size of 512 multiplied by 512, and an unlabeled image;
carrying out data enhancement on samples sampled in each round at random times and random degrees, and setting a training sample enhancement parameter range, wherein the data enhancement comprises at least one of the following items: randomly rotate n × 90 ° (n ═ 0,1,2, 3); randomly overturning for 180 degrees in the horizontal direction or the vertical direction; random scaling, wherein the multiple value range of the size scaling is [0.5,2 ]; random brightness enhancement, wherein the multiple value range of the brightness enhancement is [0.5,2 ]; random contrast enhancement, wherein the value range of the multiple of the contrast enhancement is [0.5,2 ]; and enhancing the random saturation, wherein the multiple value range of the saturation enhancement is [0.5,2 ].
Optionally, the remote sensing image semantic segmentation network is a classical semantic segmentation network UNet, where the UNet includes an encoder and a main decoder, and the number of auxiliary decoders is 3.
Optionally, the change strategy of the learning rate of the hyper-parameter and loss function in the training process includes:
Figure BDA0003360449140000091
wherein base _ lr is 4.2 e-6; the hot start factor, warp _ factor, is 1.2; epoch is the number of training iterations; lr is the learning rate, and lr gradually rises along with the increase of time; the warp _ epoch is the iteration number of hot start, and is set to be 30;
performing hot start, and after training exceeds the arm _ epoch, using a polynomial learning rate attenuation strategy, setting the maximum training iteration number maxlr to be 1500, setting the attenuation index power to be 0.9, and setting the maximum learning rate maxlr to be 1 e-3;
the loss function used for calculating the supervision loss is a cross entropy loss function, the loss function used for calculating the consistency loss is a mean square error loss function, and the whole loss function of the remote sensing image semantic segmentation network
Figure BDA0003360449140000101
Comprises the following steps:
Figure BDA0003360449140000102
Figure BDA0003360449140000103
is a prediction result; y isiIs a marked image;
Figure BDA0003360449140000104
to supervise losses; e.g. of the typemIs the prediction result of the primary decoder; e.g. of the typeaIs the prediction result of the auxiliary decoder;
Figure BDA0003360449140000105
loss of consistency; λ (epoch) is
Figure BDA0003360449140000106
The weight of (c).
In the first training stage, the overall loss is dominated by the supervision loss part;
in the second training phase, the overall loss is dominated by the consistency loss, and the specific formula of λ (epoch) is:
Figure BDA0003360449140000107
wherein num _ epochs is the total number of training iterations; setting a training round threshold a to be 200, gradually increasing lambda (epoch) before the training iteration is performed for the a time, and stabilizing lambda (epoch) to w after the training exceeds the a time; num _ epochs is set to 1500 and w is set to 1.
Optionally, the randomly transforming the unlabeled sample feature map includes:
dropout is 0.5, the random horizontal direction and the vertical direction are turned over by 180 degrees, and Gaussian noise N-U (-0.2,0.2) which is subjected to uniform distribution is added.
Optionally, the fusing the prediction results of the primary decoder and the secondary decoder includes:
and voting the prediction result of the main decoder and the prediction result of the auxiliary decoder according to positions, and taking the category with the most votes of each pixel as a final fusion result to generate the pseudo mark.
FIG. 5 is a block diagram illustrating a cross-consistency learning remote sensing image semantic segmentation model training system according to an exemplary embodiment. As shown in fig. 5, corresponding to the first training stage, the remote sensing image and the corresponding labeled data are input into the encoder and the main decoder, the supervision loss is calculated according to the output of the main decoder and the labeled data of the remote sensing image, and the parameters of the encoder and the main decoder are preliminarily optimized.
FIG. 6 is a block diagram illustrating a cross-consistency learning remote sensing image semantic segmentation model training system according to an exemplary embodiment. As shown in fig. 6, corresponding to the second training phase described above. In the first part of the second training stage, extracting a characteristic tensor from unmarked remote sensing image data by a main decoder; and then, the feature tensor is subjected to random transformation and then is input into a main decoder and auxiliary decoders, and the consistency loss is calculated by using the prediction result of the main decoder and the prediction results of the auxiliary decoders. And in the second training stage, in the second part, the prediction results of the main decoder and the auxiliary decoder are fused to obtain a pseudo mark, and the pseudo mark and the corresponding remote sensing image are used as mark samples to be input into the encoder and the main decoder in the graph 5 again for supervised training.
FIG. 4 is a block diagram illustrating an apparatus for training a semantic segmentation network for cross-consistency self-trained remote sensing images according to an exemplary embodiment. Referring to fig. 4, the apparatus includes a construction module 410, a training module 420, and a segmentation module 430.
The construction module 410 is used for constructing a remote sensing image semantic segmentation network, wherein the remote sensing semantic segmentation network is UNet;
the training module 420 is used for training the remote sensing image semantic segmentation network according to the remote sensing image sample data set;
and the segmentation module 430 is used for inputting the remote sensing image shot in real time into the trained remote sensing image semantic segmentation network and outputting the semantic segmentation result of the remote sensing image.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
FIG. 7 is a block diagram illustrating an apparatus 700 for cross-consistent self-trained remote sensing image semantic segmentation network training, according to an example embodiment.
In an exemplary embodiment, a storage medium comprising instructions, such as memory 710 comprising instructions, interface 730, executable by processor 720 of device 700 to perform the above-described method is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (12)

1. A cross consistency self-training remote sensing image semantic segmentation network training method is characterized by comprising the following steps:
constructing a remote sensing image semantic segmentation network which is UNet;
training the semantic segmentation network of the remote sensing image according to the sample data set of the remote sensing image;
and inputting the remote sensing image shot in real time into the trained remote sensing image semantic segmentation network, and outputting a semantic segmentation result of the remote sensing image.
2. The method of claim 1, wherein training the remote sensing image semantic segmentation network from a remote sensing image dataset comprises:
acquiring a sample data set of the remote sensing image, wherein the sample data set of the remote sensing image comprises a remote sensing image;
marking the ground object categories of pixels in the remote sensing image to generate a corresponding label remote sensing image;
preprocessing the remote sensing image sample data set to obtain a training set, a verification set and a test set by division;
constructing a remote sensing image semantic segmentation network comprising an encoder, a main decoder and a plurality of auxiliary decoders;
and dividing the training set into a marked sample and a non-marked sample, and inputting the marked sample and the non-marked sample into the remote sensing image semantic segmentation network for training.
3. The method of claim 2, wherein the inputting into the remote sensing image semantic segmentation network for training comprises:
determining a hyper-parameter and a loss function used for training the semantic segmentation network of the remote sensing image;
optimizing parameters of a semantic segmentation network of the remote sensing image until the prediction precision of the verification set prediction result reaches a preset precision threshold;
and inputting the test set into a trained remote sensing image semantic segmentation network to verify the accuracy of network segmentation.
4. The method of claim 2, wherein the training of the remote sensing semantic segmentation network comprises:
dividing the training set remote sensing image into a marked sample and an unmarked sample, wherein the marked sample comprises the remote sensing image and a corresponding label remote sensing image, and the unmarked sample only comprises the remote sensing image;
in the first training stage, a marked sample and an unmarked sample are input into an encoder of the remote sensing semantic segmentation network, a marked sample characteristic diagram extracted by the encoder is input into a main decoder, and the obtained prediction result and the label data calculate the supervision loss;
in the first part of the second training stage, random transformation is carried out on the unlabelled sample feature map extracted by the encoder, the unlabelled sample feature map is input into a main decoder and an auxiliary decoder, and consistency loss is calculated according to the prediction result of the auxiliary decoder and the prediction result of the main decoder;
and in the second part of the second training stage, the prediction results of the main decoder and the auxiliary decoder are fused to obtain a pseudo mark, and the pseudo mark and the corresponding remote sensing image are used as a mark sample for supervised training.
5. The method of claim 2, wherein the preprocessing the set of remote sensing image sample data comprises:
randomly sampling the remote sensing image in the remote sensing image sample data set into a small image to perform multiple rounds of batch training, wherein the sampling frequency of each round of training is 480;
setting the sampling size and the training batch value of the remote sensing image according to the size of a video memory space, wherein the default size of an input image is 512 multiplied by 512, and the default size of the training batch is 10;
carrying out multi-round random sampling on the remote sensing image and the label remote sensing image, wherein each sampling obtains a labeled remote sensing image and a corresponding label image with the size of 512 multiplied by 512, and an unlabeled image;
carrying out data enhancement on samples sampled in each round at random times and random degrees, and setting a training sample enhancement parameter range, wherein the data enhancement comprises at least one of the following items: randomly rotate n × 90 ° (n ═ 0,1,2, 3); randomly overturning for 180 degrees in the horizontal direction or the vertical direction; random scaling, wherein the multiple value range of the size scaling is [0.5,2 ]; random brightness enhancement, wherein the multiple value range of the brightness enhancement is [0.5,2 ]; random contrast enhancement, wherein the value range of the multiple of the contrast enhancement is [0.5,2 ]; and enhancing the random saturation, wherein the multiple value range of the saturation enhancement is [0.5,2 ].
6. The method of claim 2, wherein the remote sensing image semantic segmentation network is a classical semantic segmentation network UNet, the UNet comprising an encoder and a primary decoder, and the number of secondary decoders is 3.
7. The method of claim 4, wherein the strategy for learning rate variation of the hyper-parametric and loss functions in the training process comprises:
Figure FDA0003360449130000021
wherein base _ lr is 4.2 e-6; the hot start factor, warp _ factor, is 1.2; epoch is the number of training iterations; lr is the learning rate, and lr gradually rises along with the increase of time; the warp _ epoch is the iteration number of hot start, and is set to be 30;
carrying out hot start, and when training exceeds the arm _ epoch, using a polynomial learning rate attenuation strategy, setting the maximum training iteration times maxlr to be 1500, setting the attenuation index power to be 0.9, and setting the maximum learning rate maxlr to be le-3;
the loss function used for calculating the supervision loss is a cross entropy loss function, and the loss function used for calculating the consistency loss is a mean squareError loss function, global loss function of the remote sensing image semantic segmentation network
Figure FDA0003360449130000031
Comprises the following steps:
Figure FDA0003360449130000032
Figure FDA0003360449130000033
is a prediction result; y isiIs a marked image;
Figure FDA0003360449130000034
to supervise losses; e.g. of the typemIs the prediction result of the primary decoder; e.g. of the typeaIs the prediction result of the auxiliary decoder;
Figure FDA0003360449130000035
loss of consistency; λ (epoch) is
Figure FDA0003360449130000036
The weight of (c).
In the first training stage, the overall loss is dominated by the supervision loss part;
in the second training phase, the overall loss is dominated by the consistency loss, and the specific formula of λ (epoch) is:
Figure FDA0003360449130000037
wherein num _ epochs is the total number of training iterations; setting a training round threshold a to be 200, gradually increasing lambda (epoch) before the training iteration is performed for the a time, and stabilizing lambda (epoch) to w after the training exceeds the a time; num _ epochs is set to 1500 and w is set to 1.
8. The method of claim 4, wherein randomly transforming the unlabeled sample feature map comprises:
dropout is 0.5, the random horizontal direction and the vertical direction are turned over by 180 degrees, and Gaussian noise N-U (-0.2,0.2) which is subjected to uniform distribution is added.
9. The method of claim 4, wherein fusing the prediction results of the primary decoder and the secondary decoder comprises:
and voting the prediction result of the main decoder and the prediction result of the auxiliary decoder according to positions, and taking the category with the most votes of each pixel as a final fusion result to generate the pseudo mark.
10. A cross consistency self-training remote sensing image semantic segmentation network training device is characterized by comprising:
the system comprises a construction module, a semantic segmentation module and a semantic segmentation module, wherein the construction module is used for constructing a remote sensing image semantic segmentation network which is UNet;
the training module is used for training the remote sensing image semantic segmentation network according to the remote sensing image sample data set;
and the segmentation module is used for inputting the remote sensing image shot in real time into the trained remote sensing image semantic segmentation network and outputting the semantic segmentation result of the remote sensing image.
11. A cross consistency self-training remote sensing image semantic segmentation network training device is characterized by comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 9.
12. A non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of a cross-consistent self-trained remote sensing image semantic segmentation network training device, enable the cross-consistent self-trained remote sensing image semantic segmentation network training device to perform the method of any of claims 1-9.
CN202111364685.4A 2021-11-17 2021-11-17 Cross consistency self-training remote sensing image semantic segmentation network training method and device Pending CN114283285A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111364685.4A CN114283285A (en) 2021-11-17 2021-11-17 Cross consistency self-training remote sensing image semantic segmentation network training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111364685.4A CN114283285A (en) 2021-11-17 2021-11-17 Cross consistency self-training remote sensing image semantic segmentation network training method and device

Publications (1)

Publication Number Publication Date
CN114283285A true CN114283285A (en) 2022-04-05

Family

ID=80869687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111364685.4A Pending CN114283285A (en) 2021-11-17 2021-11-17 Cross consistency self-training remote sensing image semantic segmentation network training method and device

Country Status (1)

Country Link
CN (1) CN114283285A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708436A (en) * 2022-06-02 2022-07-05 深圳比特微电子科技有限公司 Training method of semantic segmentation model, semantic segmentation method, semantic segmentation device and semantic segmentation medium
CN114972313A (en) * 2022-06-22 2022-08-30 北京航空航天大学 Image segmentation network pre-training method and device
CN115049817A (en) * 2022-06-10 2022-09-13 湖南大学 Image semantic segmentation method and system based on cross-image consistency
CN115861824A (en) * 2023-02-23 2023-03-28 汕头大学 Remote sensing image identification method based on improved Transformer

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708436A (en) * 2022-06-02 2022-07-05 深圳比特微电子科技有限公司 Training method of semantic segmentation model, semantic segmentation method, semantic segmentation device and semantic segmentation medium
CN114708436B (en) * 2022-06-02 2022-09-02 深圳比特微电子科技有限公司 Training method of semantic segmentation model, semantic segmentation method, semantic segmentation device and semantic segmentation medium
CN115049817A (en) * 2022-06-10 2022-09-13 湖南大学 Image semantic segmentation method and system based on cross-image consistency
CN114972313A (en) * 2022-06-22 2022-08-30 北京航空航天大学 Image segmentation network pre-training method and device
CN114972313B (en) * 2022-06-22 2024-04-19 北京航空航天大学 Image segmentation network pre-training method and device
CN115861824A (en) * 2023-02-23 2023-03-28 汕头大学 Remote sensing image identification method based on improved Transformer
CN115861824B (en) * 2023-02-23 2023-06-06 汕头大学 Remote sensing image recognition method based on improved transducer

Similar Documents

Publication Publication Date Title
CN110705457B (en) Remote sensing image building change detection method
CN113449594B (en) Multilayer network combined remote sensing image ground semantic segmentation and area calculation method
CN111986099B (en) Tillage monitoring method and system based on convolutional neural network with residual error correction fused
CN114299380A (en) Remote sensing image semantic segmentation model training method and device for contrast consistency learning
CN113780296B (en) Remote sensing image semantic segmentation method and system based on multi-scale information fusion
CN110889449A (en) Edge-enhanced multi-scale remote sensing image building semantic feature extraction method
CN111612008B (en) Image segmentation method based on convolution network
CN114283285A (en) Cross consistency self-training remote sensing image semantic segmentation network training method and device
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN114092832A (en) High-resolution remote sensing image classification method based on parallel hybrid convolutional network
CN112950780B (en) Intelligent network map generation method and system based on remote sensing image
CN114926469B (en) Semantic segmentation model training method, semantic segmentation method, storage medium and terminal
CN103714148B (en) SAR image search method based on sparse coding classification
CN113111716B (en) Remote sensing image semiautomatic labeling method and device based on deep learning
CN110599502B (en) Skin lesion segmentation method based on deep learning
CN110675421B (en) Depth image collaborative segmentation method based on few labeling frames
CN112836614B (en) High-resolution remote sensing image classification method based on residual error network and transfer learning
CN114913434B (en) High-resolution remote sensing image change detection method based on global relation reasoning
CN109635726A (en) A kind of landslide identification method based on the symmetrical multiple dimensioned pond of depth network integration
CN117079132A (en) Remote sensing image target detection method based on Gaussian distance loss
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN114332075A (en) Rapid structural defect identification and classification method based on lightweight deep learning model
CN112329771A (en) Building material sample identification method based on deep learning
CN116580243A (en) Cross-domain remote sensing scene classification method for mask image modeling guide domain adaptation
CN111242028A (en) Remote sensing image ground object segmentation method based on U-Net

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination