CN114283285A - Cross consistency self-training remote sensing image semantic segmentation network training method and device - Google Patents
Cross consistency self-training remote sensing image semantic segmentation network training method and device Download PDFInfo
- Publication number
- CN114283285A CN114283285A CN202111364685.4A CN202111364685A CN114283285A CN 114283285 A CN114283285 A CN 114283285A CN 202111364685 A CN202111364685 A CN 202111364685A CN 114283285 A CN114283285 A CN 114283285A
- Authority
- CN
- China
- Prior art keywords
- remote sensing
- sensing image
- training
- semantic segmentation
- segmentation network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 164
- 230000011218 segmentation Effects 0.000 title claims abstract description 146
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000006870 function Effects 0.000 claims description 25
- 238000005070 sampling Methods 0.000 claims description 19
- 238000010586 diagram Methods 0.000 claims description 17
- 238000012360 testing method Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 9
- 238000012795 verification Methods 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 7
- 238000010276 construction Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000002708 enhancing effect Effects 0.000 claims description 4
- 230000001965 increasing effect Effects 0.000 claims description 4
- 238000009827 uniform distribution Methods 0.000 claims description 4
- 230000004927 fusion Effects 0.000 claims description 3
- 230000000087 stabilizing effect Effects 0.000 claims description 3
- 230000001131 transforming effect Effects 0.000 claims description 3
- 238000003709 image segmentation Methods 0.000 abstract description 8
- 238000002372 labelling Methods 0.000 abstract description 5
- 238000013527 convolutional neural network Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000003708 edge detection Methods 0.000 description 2
- 238000003064 k means clustering Methods 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The application discloses a cross consistency self-training remote sensing image semantic segmentation network training method and device, and relates to the technical field of remote sensing image segmentation. The specific implementation scheme is as follows: constructing a remote sensing image semantic segmentation network which is UNet; training the semantic segmentation network of the remote sensing image according to the sample data set of the remote sensing image; and inputting the remote sensing image shot in real time into the trained remote sensing image semantic segmentation network, and outputting a semantic segmentation result of the remote sensing image. According to the embodiment of the application, the remote sensing image is segmented by constructing the remote sensing image semantic segmentation network and training the remote sensing image semantic segmentation network according to the remote sensing image sample data set. According to the embodiment of the application, the situation that the segmentation and labeling of the remote sensing image consume a large amount of manpower can be avoided, the training efficiency is improved, and the accuracy of the segmentation of the remote sensing image is improved.
Description
Technical Field
The application relates to the technical field of remote sensing image segmentation, in particular to a cross consistency self-training remote sensing image semantic segmentation network training method and device.
Background
Image segmentation refers to a technique and a process for dividing an image into regions with characteristics and extracting an interested target, and a remote sensing image is a common and typical color image and is widely concerned. The utilization of image segmentation technology to mark a pixel in a remote sensing image as a type of ground object, such as a building, a water body, a road, a farmland, a vehicle and the like, has been a research hotspot of broad students. The traditional image segmentation method (such as threshold value method, k-Means clustering method, region method and edge detection method) only concerns to find the boundary contour of the ground object and does not concern to the category of the ground object. In recent years, due to the rapid development of deep learning and the great improvement of computer storage and calculation power, the semantic segmentation method based on the deep convolutional neural network becomes a new tool for segmenting the high-resolution remote sensing image.
However, semantic segmentation based on the deep convolutional neural network can be regarded as a pixel-level classification task, ground object targets in the image need to be densely labeled, the labeling difficulty is higher, and for a high-resolution remote sensing image, a large amount of labor and time are needed for labeling a semantic segmentation data set, which brings great difficulty to the semantic segmentation of the remote sensing image based on the deep convolutional neural network
Disclosure of Invention
The application provides a cross consistency self-training remote sensing image semantic segmentation network training method and device. The technical scheme of the application is as follows:
according to a first aspect of an embodiment of the application, a cross consistency self-training remote sensing image semantic segmentation network training method is provided, and comprises the following steps:
constructing a remote sensing image semantic segmentation network which is UNet;
training the semantic segmentation network of the remote sensing image according to the sample data set of the remote sensing image;
and inputting the remote sensing image shot in real time into the trained remote sensing image semantic segmentation network, and outputting a semantic segmentation result of the remote sensing image.
Optionally, the training the semantic segmentation network of the remote sensing image according to the remote sensing image data set includes:
acquiring a sample data set of the remote sensing image, wherein the sample data set of the remote sensing image comprises a remote sensing image;
marking the ground object categories of pixels in the remote sensing image to generate a corresponding label remote sensing image;
preprocessing the remote sensing image sample data set to obtain a training set, a verification set and a test set by division;
constructing a remote sensing image semantic segmentation network comprising an encoder, a main decoder and a plurality of auxiliary decoders;
and dividing the training set into a marked sample and a non-marked sample, and inputting the marked sample and the non-marked sample into the remote sensing image semantic segmentation network for training.
Optionally, the inputting into the remote sensing image semantic segmentation network for training includes:
determining a hyper-parameter and a loss function used for training the semantic segmentation network of the remote sensing image;
optimizing parameters of a semantic segmentation network of the remote sensing image until the prediction precision of the verification set prediction result reaches a preset precision threshold;
and inputting the test set into a trained remote sensing image semantic segmentation network to verify the accuracy of network segmentation.
Optionally, the training of the remote sensing semantic segmentation network includes:
dividing the training set remote sensing image into a marked sample and an unmarked sample, wherein the marked sample comprises the remote sensing image and a corresponding label remote sensing image, and the unmarked sample only comprises the remote sensing image;
in the first training stage, a marked sample and an unmarked sample are input into an encoder of the remote sensing semantic segmentation network, a marked sample characteristic diagram extracted by the encoder is input into a main decoder, and the obtained prediction result and the label data calculate the supervision loss;
in the first part of the second training stage, random transformation is carried out on the unlabelled sample feature map extracted by the encoder, the unlabelled sample feature map is input into a main decoder and an auxiliary decoder, and consistency loss is calculated according to the prediction result of the auxiliary decoder and the prediction result of the main decoder;
and in the second part of the second training stage, the prediction results of the main decoder and the auxiliary decoder are fused to obtain a pseudo mark, and the pseudo mark and the corresponding remote sensing image are used as a mark sample for supervised training.
Optionally, the preprocessing the remote sensing image sample data set includes:
randomly sampling the remote sensing image in the remote sensing image sample data set into a small image to perform multiple rounds of batch training, wherein the sampling frequency of each round of training is 480;
setting the sampling size and the training batch value of the remote sensing image according to the size of a video memory space, wherein the default size of an input image is 512 multiplied by 512, and the default size of the training batch is 10;
carrying out multi-round random sampling on the remote sensing image and the label remote sensing image, wherein each sampling obtains a labeled remote sensing image and a corresponding label image with the size of 512 multiplied by 512, and an unlabeled image;
carrying out data enhancement on samples sampled in each round at random times and random degrees, and setting a training sample enhancement parameter range, wherein the data enhancement comprises at least one of the following items: randomly rotate n × 90 ° (n ═ 0,1,2, 3); randomly overturning for 180 degrees in the horizontal direction or the vertical direction; random scaling, wherein the multiple value range of the size scaling is [0.5,2 ]; random brightness enhancement, wherein the multiple value range of the brightness enhancement is [0.5,2 ]; random contrast enhancement, wherein the value range of the multiple of the contrast enhancement is [0.5,2 ]; and enhancing the random saturation, wherein the multiple value range of the saturation enhancement is [0.5,2 ].
Optionally, the remote sensing image semantic segmentation network is a classical semantic segmentation network UNet, where the UNet includes an encoder and a main decoder, and the number of auxiliary decoders is 3.
Optionally, the change strategy of the learning rate of the hyper-parameter and loss function in the training process includes:
wherein base _ lr is 4.2 e-6; the hot start factor, warp _ factor, is 1.2; epoch is the number of training iterations; lr is the learning rate, and lr gradually rises along with the increase of time; the warp _ epoch is the iteration number of hot start, and is set to be 30;
performing hot start, and after training exceeds the arm _ epoch, using a polynomial learning rate attenuation strategy, setting the maximum training iteration number maxlr to be 1500, setting the attenuation index power to be 0.9, and setting the maximum learning rate maxlr to be 1 e-3;
the loss function used for calculating the supervision loss is a cross entropy loss function, the loss function used for calculating the consistency loss is a mean square error loss function, and the whole loss function of the remote sensing image semantic segmentation networkComprises the following steps:
is a prediction result; y isiIs a marked image;to supervise losses; e.g. of the typemIs the prediction result of the primary decoder; e.g. of the typeaIs the prediction result of the auxiliary decoder;loss of consistency; λ (epoch) isThe weight of (c).
In the first training stage, the overall loss is dominated by the supervision loss part;
in the second training phase, the overall loss is dominated by the consistency loss, and the specific formula of λ (epoch) is:
wherein num _ epochs is the total number of training iterations; setting a training round threshold a to be 200, gradually increasing lambda (epoch) before the training iteration is performed for the a time, and stabilizing lambda (epoch) to w after the training exceeds the a time; num _ epochs is set to 1500 and w is set to 1.
Optionally, the randomly transforming the unlabeled sample feature map includes:
dropout is 0.5, the random horizontal direction and the vertical direction are turned over by 180 degrees, and Gaussian noise N-U (-0.2,0.2) which is subjected to uniform distribution is added.
Optionally, the fusing the prediction results of the primary decoder and the secondary decoder includes:
and voting the prediction result of the main decoder and the prediction result of the auxiliary decoder according to positions, and taking the category with the most votes of each pixel as a final fusion result to generate the pseudo mark.
According to a second aspect of the embodiments of the present application, there is provided a cross consistency self-training remote sensing image semantic segmentation network training device, including:
the system comprises a construction module, a semantic segmentation module and a semantic segmentation module, wherein the construction module is used for constructing a remote sensing image semantic segmentation network which is UNet;
the training module is used for training the remote sensing image semantic segmentation network according to the remote sensing image sample data set;
and the segmentation module is used for inputting the remote sensing image shot in real time into the trained remote sensing image semantic segmentation network and outputting the semantic segmentation result of the remote sensing image.
According to a third aspect of the embodiments of the present application, there is provided a cross consistency self-training remote sensing image semantic segmentation network training device, including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method of any of the first aspects above.
According to a fourth aspect of embodiments of the present application, there is provided a non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of a cross-consistency self-training remote sensing image semantic segmentation network training device, enable the cross-consistency self-training remote sensing image semantic segmentation network training device to perform the method according to any one of the first aspect.
The technical scheme provided by the embodiment of the application at least has the following beneficial effects:
in the second training stage, a small amount of labeled data and a large amount of unlabeled data are utilized, consistency loss is solved for the output of the main decoder and the output of the auxiliary decoder by enhancing the data, overfitting of the model is effectively prevented, and the generalization capability of the model is improved. And output results of all decoders are fused to obtain pseudo-labeled images, and the pseudo-labeled images and corresponding remote sensing images are used as labeled samples to conduct supervision training, so that the performance of the model can be further improved. Therefore, under the condition of insufficient marking data, the method and the device can train a model with better performance by using a large amount of unmarked data, reduce the requirement on marking samples and reduce the labor cost of data marking.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the application and are not to be construed as limiting the application.
FIG. 1 is a flow diagram illustrating a method for training a cross-consistency self-trained remote sensing image semantic segmentation network according to an exemplary embodiment.
FIG. 2 is a flow diagram illustrating a method for training a cross-consistency self-trained remote sensing image semantic segmentation network according to an exemplary embodiment.
FIG. 3 is a flow diagram illustrating a method for training a cross-consistency self-trained remote sensing image semantic segmentation network according to an exemplary embodiment.
FIG. 4 is a block diagram illustrating a cross-consistency self-trained remote sensing image semantic segmentation network training apparatus according to an exemplary embodiment.
FIG. 5 is a block diagram illustrating a cross-consistency learning remote sensing image semantic segmentation model training system according to an exemplary embodiment.
FIG. 6 is a block diagram illustrating a cross-consistency learning remote sensing image semantic segmentation model training system according to an exemplary embodiment.
FIG. 7 is a block diagram illustrating an apparatus in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present application better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The purpose of image segmentation is to label each pixel as a category, and for remote sensing images, as a type of terrain, such as buildings, bodies of water, roads, farmland, vehicles, and so on. The image semantic segmentation is developed from the traditional image segmentation method, the traditional image segmentation method (threshold value method, k-Means clustering method, region method and edge detection method) only cares about the boundary contour of the ground feature and does not care about the category of the ground feature, and the semantic segmentation not only needs to accurately find the contour of the ground feature, but also needs to accurately judge the category of the ground feature, namely, gives the semantic meaning. Due to the rapid development of deep learning and the great improvement of computer storage and operational power, the semantic segmentation method based on the deep convolutional neural network becomes a new edge tool for segmenting the high-resolution remote sensing image.
The semantic segmentation based on the deep convolutional neural network can be regarded as a pixel-level classification task, ground object targets in the image need to be densely labeled, the labeling difficulty is higher, and for a high-resolution remote sensing image, a large amount of labor and time are needed for labeling a semantic segmentation data set, so that great difficulty is brought to the semantic segmentation of the remote sensing image based on the deep convolutional neural network.
There are two solutions in the related art: 1. the method based on self-training is mainly divided into three steps: in the first step, a model is trained on labeled data. And secondly, generating a pseudo label for the label-free data set by using the pre-trained model. And thirdly, retraining a model by using the true labels with the labeled data sets and the false labels without the labeled data sets. Fourthly, repeating the above processes for several times. This approach works well, but repeated training increases the time cost.
2. The core of the consistency learning-based method is to encourage the model to have similar output on the same sample after different transformations, the transformations comprise random rotation, turnover, color change and the like, the whole process is performed simultaneously with supervised training, the effect is better, but a space for improvement still exists.
FIG. 1 is a flow diagram illustrating a method for training a cross-consistency self-trained remote sensing image semantic segmentation network according to an exemplary embodiment. As shown in FIG. 1, the method is used for semantic segmentation of the remote sensing image and comprises the following steps.
in the embodiment of the application, a remote sensing image semantic segmentation network is constructed, the remote sensing image semantic segmentation network is UNet, the UNet is composed of an encoder and a decoder, the encoder of the first half part is used for feature extraction, and the decoder of the second half part is used for up-sampling.
in the embodiment of the application, after the remote sensing image semantic segmentation network is constructed, the remote sensing image semantic segmentation network is trained according to the collected remote sensing image data set. The training process is divided into a first training stage and a second training stage, and in the first training stage, the overall loss is dominated by the supervision loss part; in the second training phase, the overall loss is dominated by the consistency loss.
And 103, inputting the remote sensing image shot in real time into the trained remote sensing image semantic segmentation network, and outputting a semantic segmentation result of the remote sensing image.
In the embodiment of the application, after the training of the remote sensing image semantic segmentation network is finished, the remote sensing image shot in real time can be input into the remote sensing image semantic segmentation network so as to obtain the semantic segmentation result of the remote sensing image.
According to the embodiment of the application, a small amount of labeled data and a large amount of label-free data are utilized in the second training stage, consistency loss is solved for the output of the main decoder and the output of the auxiliary decoder through data enhancement, model overfitting is effectively prevented, and the generalization capability of the model is improved. And output results of all decoders are fused to obtain pseudo-labeled images, and the pseudo-labeled images and corresponding remote sensing images are used as labeled samples to conduct supervision training, so that the performance of the model can be further improved. Therefore, under the condition of insufficient marking data, the method and the device can train a model with better performance by using a large amount of unmarked data, reduce the requirement on marking samples and reduce the labor cost of data marking.
FIG. 2 is a flow diagram illustrating a method for training a cross-consistency self-trained remote sensing image semantic segmentation network according to an exemplary embodiment. As shown in FIG. 2, the method is used for semantic segmentation of the remote sensing image and comprises the following steps.
in the embodiment of the application, a remote sensing image is required to be collected as sample data to train the semantic segmentation model of the remote sensing image, and historical remote sensing data is obtained from a database to form a sample data set of the remote sensing image.
202, marking the ground object types of pixels in the remote sensing image to generate a corresponding label remote sensing image;
in the embodiment of the present application, the task of the semantic segmentation model of the remote sensing image is to identify and mark a surface feature class to which a pixel in the remote sensing image belongs, and in one possible embodiment, the surface feature class includes: buildings, water bodies, roads, arables, vehicles.
in the embodiment of the application, in order to improve the training efficiency and reduce the workload of manually marking the ground object types, the remote sensing image sample data is preprocessed intensively by the remote sensing image. Sampling the remote sensing image into the same size, marking the ground feature type of the pixel, performing random data enhancement on the sampled image, and the like. And dividing the remote sensing image sample data set into a training set, a verification set and a test set. The training set is used for training the semantic segmentation model of the remote sensing image; the verification set is used for verifying whether the trained semantic segmentation model of the remote sensing image can accurately segment the remote sensing image or not; the test set is used for testing the segmentation accuracy of the trained semantic segmentation model of the remote sensing image.
in the embodiment of the application, the teacher network with the structure of student network is the same, all includes encoder and decoder, the encoder is used for going up the sampling, extracts the high dimensional feature of remote sensing image in order to generate the feature tensor, the decoder is used for down-sampling, will the feature tensor is dimension reduction to the result is cut apart in the generation, it does to cut apart the result the surface feature classification of each pixel in the remote sensing image with the probability that the surface feature classification corresponds.
And 205, dividing the training set into a marked sample and a non-marked sample, and inputting the marked sample and the non-marked sample into the remote sensing image semantic segmentation network for training.
Inputting a marked sample and a unmarked sample into an encoder and a main decoder in the remote sensing image semantic segmentation network, calculating supervision loss and training; and inputting the unmarked sample into an encoder, a main decoder and an auxiliary decoder in the remote sensing image semantic segmentation network, and generating a pseudo mark according to the output of the auxiliary decoder and the prediction result of the main decoder. And performing supervision training by taking the pseudo mark and the corresponding remote sensing image as a mark sample.
FIG. 3 is a flow diagram illustrating a method for training a cross-consistency self-trained remote sensing image semantic segmentation network according to an exemplary embodiment. As shown in FIG. 3, the method is used for semantic segmentation of the remote sensing image and comprises the following steps.
in the embodiment of the application, the parameter of the semantic segmentation model of the remote sensing image is set before the training process, and the hyper-parameter needs to be continuously optimized in the training process so as to improve the performance and effect of segmenting the remote sensing image. And the loss function is used for calculating the difference between the segmentation result and the marked sample, and the loss is calculated according to the loss function so as to measure the accuracy of the segmentation result of the remote sensing image.
in the embodiment of the application, the remote sensing image semantic segmentation network needs to be optimized to a certain precision, the precision of the remote sensing image semantic segmentation network segmentation is verified according to the verification set, the semantic segmentation result is compared with the pixel proportion matched with the label in the label remote sensing image, the proportion is larger than or equal to the preset precision threshold value, the training is sufficient, and the training can be stopped.
And 303, inputting the test set into the trained remote sensing image semantic segmentation network to verify the network segmentation accuracy.
In the embodiment of the application, the trained semantic segmentation network of the remote sensing image is tested according to the test set, and the accuracy of the trained semantic segmentation network of the remote sensing image is calculated according to the comparison between the segmentation result of the test set and the label of the test set, whether the ground feature classes of the same pixel are the same and the proportion of the pixels occupying all the pixels.
Optionally, the training of the remote sensing semantic segmentation network includes:
dividing the training set remote sensing image into a marked sample and an unmarked sample, wherein the marked sample comprises the remote sensing image and a corresponding label remote sensing image, and the unmarked sample only comprises the remote sensing image;
in the first training stage, a marked sample and an unmarked sample are input into an encoder of the remote sensing semantic segmentation network, a marked sample characteristic diagram extracted by the encoder is input into a main decoder, and the obtained prediction result and the label data calculate the supervision loss;
in the embodiment of the application, the supervision loss is calculated by a cross entropy loss function.
And in the first part of the second training stage, the unlabeled sample feature map extracted by the encoder is subjected to random transformation and is input into a main decoder and an auxiliary decoder, and consistency loss is calculated according to the prediction result of the auxiliary decoder and the prediction result of the main decoder.
In the embodiment of the present application, the random transformation includes dropout of 0.5, and the random horizontal direction is 180 ° inverted from the vertical direction, and gaussian noise N to U (-0.2,0.2) subject to uniform distribution is added. The number of the auxiliary decoders is 3, the prediction result of each auxiliary decoder is compared with the prediction result of the main decoder to calculate consistency loss, and the consistency loss is calculated by a mean square error loss function. The maximum value of the coefficient of the loss of consistency is set to 1. The consistency loss reflects the difference of the outputs of the auxiliary decoder and the main decoder, so that the consistency loss is reduced as a target to train the remote sensing semantic segmentation network, namely, the prediction result of the auxiliary decoder is encouraged to be consistent with the prediction result of the main decoder as much as possible.
And in the second part of the second training stage, the prediction results of the main decoder and the auxiliary decoder are fused to obtain a pseudo mark, and the pseudo mark and the corresponding remote sensing image are used as a mark sample for supervised training.
In the embodiment of the present application, the prediction results of the main decoder and the auxiliary decoder are fused, and the feature class with the largest pixel vote number is selected by using a voting method, so as to generate the pseudo tag. The pseudo mark can be used as a mark corresponding to the remote sensing image, and is input into the encoder and the main decoder together with the remote sensing image for training. Thus, the number of training samples is increased, and the training effect is improved.
Optionally, the preprocessing the remote sensing image sample data set includes:
randomly sampling the remote sensing image in the remote sensing image sample data set into a small image to perform multiple rounds of batch training, wherein the sampling frequency of each round of training is 480;
setting the sampling size and the training batch value of the remote sensing image according to the size of a video memory space, wherein the default size of an input image is 512 multiplied by 512, and the default size of the training batch is 10;
carrying out multi-round random sampling on the remote sensing image and the label remote sensing image, wherein each sampling obtains a labeled remote sensing image and a corresponding label image with the size of 512 multiplied by 512, and an unlabeled image;
carrying out data enhancement on samples sampled in each round at random times and random degrees, and setting a training sample enhancement parameter range, wherein the data enhancement comprises at least one of the following items: randomly rotate n × 90 ° (n ═ 0,1,2, 3); randomly overturning for 180 degrees in the horizontal direction or the vertical direction; random scaling, wherein the multiple value range of the size scaling is [0.5,2 ]; random brightness enhancement, wherein the multiple value range of the brightness enhancement is [0.5,2 ]; random contrast enhancement, wherein the value range of the multiple of the contrast enhancement is [0.5,2 ]; and enhancing the random saturation, wherein the multiple value range of the saturation enhancement is [0.5,2 ].
Optionally, the remote sensing image semantic segmentation network is a classical semantic segmentation network UNet, where the UNet includes an encoder and a main decoder, and the number of auxiliary decoders is 3.
Optionally, the change strategy of the learning rate of the hyper-parameter and loss function in the training process includes:
wherein base _ lr is 4.2 e-6; the hot start factor, warp _ factor, is 1.2; epoch is the number of training iterations; lr is the learning rate, and lr gradually rises along with the increase of time; the warp _ epoch is the iteration number of hot start, and is set to be 30;
performing hot start, and after training exceeds the arm _ epoch, using a polynomial learning rate attenuation strategy, setting the maximum training iteration number maxlr to be 1500, setting the attenuation index power to be 0.9, and setting the maximum learning rate maxlr to be 1 e-3;
the loss function used for calculating the supervision loss is a cross entropy loss function, the loss function used for calculating the consistency loss is a mean square error loss function, and the whole loss function of the remote sensing image semantic segmentation networkComprises the following steps:
is a prediction result; y isiIs a marked image;to supervise losses; e.g. of the typemIs the prediction result of the primary decoder; e.g. of the typeaIs the prediction result of the auxiliary decoder;loss of consistency; λ (epoch) isThe weight of (c).
In the first training stage, the overall loss is dominated by the supervision loss part;
in the second training phase, the overall loss is dominated by the consistency loss, and the specific formula of λ (epoch) is:
wherein num _ epochs is the total number of training iterations; setting a training round threshold a to be 200, gradually increasing lambda (epoch) before the training iteration is performed for the a time, and stabilizing lambda (epoch) to w after the training exceeds the a time; num _ epochs is set to 1500 and w is set to 1.
Optionally, the randomly transforming the unlabeled sample feature map includes:
dropout is 0.5, the random horizontal direction and the vertical direction are turned over by 180 degrees, and Gaussian noise N-U (-0.2,0.2) which is subjected to uniform distribution is added.
Optionally, the fusing the prediction results of the primary decoder and the secondary decoder includes:
and voting the prediction result of the main decoder and the prediction result of the auxiliary decoder according to positions, and taking the category with the most votes of each pixel as a final fusion result to generate the pseudo mark.
FIG. 5 is a block diagram illustrating a cross-consistency learning remote sensing image semantic segmentation model training system according to an exemplary embodiment. As shown in fig. 5, corresponding to the first training stage, the remote sensing image and the corresponding labeled data are input into the encoder and the main decoder, the supervision loss is calculated according to the output of the main decoder and the labeled data of the remote sensing image, and the parameters of the encoder and the main decoder are preliminarily optimized.
FIG. 6 is a block diagram illustrating a cross-consistency learning remote sensing image semantic segmentation model training system according to an exemplary embodiment. As shown in fig. 6, corresponding to the second training phase described above. In the first part of the second training stage, extracting a characteristic tensor from unmarked remote sensing image data by a main decoder; and then, the feature tensor is subjected to random transformation and then is input into a main decoder and auxiliary decoders, and the consistency loss is calculated by using the prediction result of the main decoder and the prediction results of the auxiliary decoders. And in the second training stage, in the second part, the prediction results of the main decoder and the auxiliary decoder are fused to obtain a pseudo mark, and the pseudo mark and the corresponding remote sensing image are used as mark samples to be input into the encoder and the main decoder in the graph 5 again for supervised training.
FIG. 4 is a block diagram illustrating an apparatus for training a semantic segmentation network for cross-consistency self-trained remote sensing images according to an exemplary embodiment. Referring to fig. 4, the apparatus includes a construction module 410, a training module 420, and a segmentation module 430.
The construction module 410 is used for constructing a remote sensing image semantic segmentation network, wherein the remote sensing semantic segmentation network is UNet;
the training module 420 is used for training the remote sensing image semantic segmentation network according to the remote sensing image sample data set;
and the segmentation module 430 is used for inputting the remote sensing image shot in real time into the trained remote sensing image semantic segmentation network and outputting the semantic segmentation result of the remote sensing image.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
FIG. 7 is a block diagram illustrating an apparatus 700 for cross-consistent self-trained remote sensing image semantic segmentation network training, according to an example embodiment.
In an exemplary embodiment, a storage medium comprising instructions, such as memory 710 comprising instructions, interface 730, executable by processor 720 of device 700 to perform the above-described method is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
Claims (12)
1. A cross consistency self-training remote sensing image semantic segmentation network training method is characterized by comprising the following steps:
constructing a remote sensing image semantic segmentation network which is UNet;
training the semantic segmentation network of the remote sensing image according to the sample data set of the remote sensing image;
and inputting the remote sensing image shot in real time into the trained remote sensing image semantic segmentation network, and outputting a semantic segmentation result of the remote sensing image.
2. The method of claim 1, wherein training the remote sensing image semantic segmentation network from a remote sensing image dataset comprises:
acquiring a sample data set of the remote sensing image, wherein the sample data set of the remote sensing image comprises a remote sensing image;
marking the ground object categories of pixels in the remote sensing image to generate a corresponding label remote sensing image;
preprocessing the remote sensing image sample data set to obtain a training set, a verification set and a test set by division;
constructing a remote sensing image semantic segmentation network comprising an encoder, a main decoder and a plurality of auxiliary decoders;
and dividing the training set into a marked sample and a non-marked sample, and inputting the marked sample and the non-marked sample into the remote sensing image semantic segmentation network for training.
3. The method of claim 2, wherein the inputting into the remote sensing image semantic segmentation network for training comprises:
determining a hyper-parameter and a loss function used for training the semantic segmentation network of the remote sensing image;
optimizing parameters of a semantic segmentation network of the remote sensing image until the prediction precision of the verification set prediction result reaches a preset precision threshold;
and inputting the test set into a trained remote sensing image semantic segmentation network to verify the accuracy of network segmentation.
4. The method of claim 2, wherein the training of the remote sensing semantic segmentation network comprises:
dividing the training set remote sensing image into a marked sample and an unmarked sample, wherein the marked sample comprises the remote sensing image and a corresponding label remote sensing image, and the unmarked sample only comprises the remote sensing image;
in the first training stage, a marked sample and an unmarked sample are input into an encoder of the remote sensing semantic segmentation network, a marked sample characteristic diagram extracted by the encoder is input into a main decoder, and the obtained prediction result and the label data calculate the supervision loss;
in the first part of the second training stage, random transformation is carried out on the unlabelled sample feature map extracted by the encoder, the unlabelled sample feature map is input into a main decoder and an auxiliary decoder, and consistency loss is calculated according to the prediction result of the auxiliary decoder and the prediction result of the main decoder;
and in the second part of the second training stage, the prediction results of the main decoder and the auxiliary decoder are fused to obtain a pseudo mark, and the pseudo mark and the corresponding remote sensing image are used as a mark sample for supervised training.
5. The method of claim 2, wherein the preprocessing the set of remote sensing image sample data comprises:
randomly sampling the remote sensing image in the remote sensing image sample data set into a small image to perform multiple rounds of batch training, wherein the sampling frequency of each round of training is 480;
setting the sampling size and the training batch value of the remote sensing image according to the size of a video memory space, wherein the default size of an input image is 512 multiplied by 512, and the default size of the training batch is 10;
carrying out multi-round random sampling on the remote sensing image and the label remote sensing image, wherein each sampling obtains a labeled remote sensing image and a corresponding label image with the size of 512 multiplied by 512, and an unlabeled image;
carrying out data enhancement on samples sampled in each round at random times and random degrees, and setting a training sample enhancement parameter range, wherein the data enhancement comprises at least one of the following items: randomly rotate n × 90 ° (n ═ 0,1,2, 3); randomly overturning for 180 degrees in the horizontal direction or the vertical direction; random scaling, wherein the multiple value range of the size scaling is [0.5,2 ]; random brightness enhancement, wherein the multiple value range of the brightness enhancement is [0.5,2 ]; random contrast enhancement, wherein the value range of the multiple of the contrast enhancement is [0.5,2 ]; and enhancing the random saturation, wherein the multiple value range of the saturation enhancement is [0.5,2 ].
6. The method of claim 2, wherein the remote sensing image semantic segmentation network is a classical semantic segmentation network UNet, the UNet comprising an encoder and a primary decoder, and the number of secondary decoders is 3.
7. The method of claim 4, wherein the strategy for learning rate variation of the hyper-parametric and loss functions in the training process comprises:
wherein base _ lr is 4.2 e-6; the hot start factor, warp _ factor, is 1.2; epoch is the number of training iterations; lr is the learning rate, and lr gradually rises along with the increase of time; the warp _ epoch is the iteration number of hot start, and is set to be 30;
carrying out hot start, and when training exceeds the arm _ epoch, using a polynomial learning rate attenuation strategy, setting the maximum training iteration times maxlr to be 1500, setting the attenuation index power to be 0.9, and setting the maximum learning rate maxlr to be le-3;
the loss function used for calculating the supervision loss is a cross entropy loss function, and the loss function used for calculating the consistency loss is a mean squareError loss function, global loss function of the remote sensing image semantic segmentation networkComprises the following steps:
is a prediction result; y isiIs a marked image;to supervise losses; e.g. of the typemIs the prediction result of the primary decoder; e.g. of the typeaIs the prediction result of the auxiliary decoder;loss of consistency; λ (epoch) isThe weight of (c).
In the first training stage, the overall loss is dominated by the supervision loss part;
in the second training phase, the overall loss is dominated by the consistency loss, and the specific formula of λ (epoch) is:
wherein num _ epochs is the total number of training iterations; setting a training round threshold a to be 200, gradually increasing lambda (epoch) before the training iteration is performed for the a time, and stabilizing lambda (epoch) to w after the training exceeds the a time; num _ epochs is set to 1500 and w is set to 1.
8. The method of claim 4, wherein randomly transforming the unlabeled sample feature map comprises:
dropout is 0.5, the random horizontal direction and the vertical direction are turned over by 180 degrees, and Gaussian noise N-U (-0.2,0.2) which is subjected to uniform distribution is added.
9. The method of claim 4, wherein fusing the prediction results of the primary decoder and the secondary decoder comprises:
and voting the prediction result of the main decoder and the prediction result of the auxiliary decoder according to positions, and taking the category with the most votes of each pixel as a final fusion result to generate the pseudo mark.
10. A cross consistency self-training remote sensing image semantic segmentation network training device is characterized by comprising:
the system comprises a construction module, a semantic segmentation module and a semantic segmentation module, wherein the construction module is used for constructing a remote sensing image semantic segmentation network which is UNet;
the training module is used for training the remote sensing image semantic segmentation network according to the remote sensing image sample data set;
and the segmentation module is used for inputting the remote sensing image shot in real time into the trained remote sensing image semantic segmentation network and outputting the semantic segmentation result of the remote sensing image.
11. A cross consistency self-training remote sensing image semantic segmentation network training device is characterized by comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 9.
12. A non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of a cross-consistent self-trained remote sensing image semantic segmentation network training device, enable the cross-consistent self-trained remote sensing image semantic segmentation network training device to perform the method of any of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111364685.4A CN114283285A (en) | 2021-11-17 | 2021-11-17 | Cross consistency self-training remote sensing image semantic segmentation network training method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111364685.4A CN114283285A (en) | 2021-11-17 | 2021-11-17 | Cross consistency self-training remote sensing image semantic segmentation network training method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114283285A true CN114283285A (en) | 2022-04-05 |
Family
ID=80869687
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111364685.4A Pending CN114283285A (en) | 2021-11-17 | 2021-11-17 | Cross consistency self-training remote sensing image semantic segmentation network training method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114283285A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114708436A (en) * | 2022-06-02 | 2022-07-05 | 深圳比特微电子科技有限公司 | Training method of semantic segmentation model, semantic segmentation method, semantic segmentation device and semantic segmentation medium |
CN114972313A (en) * | 2022-06-22 | 2022-08-30 | 北京航空航天大学 | Image segmentation network pre-training method and device |
CN115049817A (en) * | 2022-06-10 | 2022-09-13 | 湖南大学 | Image semantic segmentation method and system based on cross-image consistency |
CN115861824A (en) * | 2023-02-23 | 2023-03-28 | 汕头大学 | Remote sensing image identification method based on improved Transformer |
-
2021
- 2021-11-17 CN CN202111364685.4A patent/CN114283285A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114708436A (en) * | 2022-06-02 | 2022-07-05 | 深圳比特微电子科技有限公司 | Training method of semantic segmentation model, semantic segmentation method, semantic segmentation device and semantic segmentation medium |
CN114708436B (en) * | 2022-06-02 | 2022-09-02 | 深圳比特微电子科技有限公司 | Training method of semantic segmentation model, semantic segmentation method, semantic segmentation device and semantic segmentation medium |
CN115049817A (en) * | 2022-06-10 | 2022-09-13 | 湖南大学 | Image semantic segmentation method and system based on cross-image consistency |
CN114972313A (en) * | 2022-06-22 | 2022-08-30 | 北京航空航天大学 | Image segmentation network pre-training method and device |
CN114972313B (en) * | 2022-06-22 | 2024-04-19 | 北京航空航天大学 | Image segmentation network pre-training method and device |
CN115861824A (en) * | 2023-02-23 | 2023-03-28 | 汕头大学 | Remote sensing image identification method based on improved Transformer |
CN115861824B (en) * | 2023-02-23 | 2023-06-06 | 汕头大学 | Remote sensing image recognition method based on improved transducer |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110705457B (en) | Remote sensing image building change detection method | |
CN113449594B (en) | Multilayer network combined remote sensing image ground semantic segmentation and area calculation method | |
CN111986099B (en) | Tillage monitoring method and system based on convolutional neural network with residual error correction fused | |
CN114299380A (en) | Remote sensing image semantic segmentation model training method and device for contrast consistency learning | |
CN113780296B (en) | Remote sensing image semantic segmentation method and system based on multi-scale information fusion | |
CN110889449A (en) | Edge-enhanced multi-scale remote sensing image building semantic feature extraction method | |
CN111612008B (en) | Image segmentation method based on convolution network | |
CN114283285A (en) | Cross consistency self-training remote sensing image semantic segmentation network training method and device | |
CN111950453A (en) | Optional-shape text recognition method based on selective attention mechanism | |
CN114092832A (en) | High-resolution remote sensing image classification method based on parallel hybrid convolutional network | |
CN112950780B (en) | Intelligent network map generation method and system based on remote sensing image | |
CN114926469B (en) | Semantic segmentation model training method, semantic segmentation method, storage medium and terminal | |
CN103714148B (en) | SAR image search method based on sparse coding classification | |
CN113111716B (en) | Remote sensing image semiautomatic labeling method and device based on deep learning | |
CN110599502B (en) | Skin lesion segmentation method based on deep learning | |
CN110675421B (en) | Depth image collaborative segmentation method based on few labeling frames | |
CN112836614B (en) | High-resolution remote sensing image classification method based on residual error network and transfer learning | |
CN114913434B (en) | High-resolution remote sensing image change detection method based on global relation reasoning | |
CN109635726A (en) | A kind of landslide identification method based on the symmetrical multiple dimensioned pond of depth network integration | |
CN117079132A (en) | Remote sensing image target detection method based on Gaussian distance loss | |
CN114332473A (en) | Object detection method, object detection device, computer equipment, storage medium and program product | |
CN114332075A (en) | Rapid structural defect identification and classification method based on lightweight deep learning model | |
CN112329771A (en) | Building material sample identification method based on deep learning | |
CN116580243A (en) | Cross-domain remote sensing scene classification method for mask image modeling guide domain adaptation | |
CN111242028A (en) | Remote sensing image ground object segmentation method based on U-Net |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |