WO2022199225A1 - 解码方法、装置和计算机可读存储介质 - Google Patents

解码方法、装置和计算机可读存储介质 Download PDF

Info

Publication number
WO2022199225A1
WO2022199225A1 PCT/CN2022/071371 CN2022071371W WO2022199225A1 WO 2022199225 A1 WO2022199225 A1 WO 2022199225A1 CN 2022071371 W CN2022071371 W CN 2022071371W WO 2022199225 A1 WO2022199225 A1 WO 2022199225A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target domain
domain image
segmentation model
target
Prior art date
Application number
PCT/CN2022/071371
Other languages
English (en)
French (fr)
Inventor
陶大程
兰猛
Original Assignee
北京沃东天骏信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2022199225A1 publication Critical patent/WO2022199225A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present disclosure relates to the field of computer technology, and in particular, to an image segmentation method, an apparatus, and a computer-readable storage medium.
  • the segmentation of remote sensing road images aims to separate road information from complex high-resolution remote sensing images, which is a very challenging task. It is an important research direction in the field of remote sensing, and also has important applications in daily life, such as vehicle navigation, map information update, urban planning and disaster relief.
  • an image segmentation method including: inputting a source domain image and a target domain image into a first generative adversarial network respectively, training the first generative adversarial network based on adversarial learning, and obtaining a trained adversarial network.
  • the first generative adversarial network wherein the first generator in the first generative adversarial network is a first image segmentation model; the target domain images are divided into a first set and a second set, wherein, the target domain images in the first set pass through After training the first image segmentation model in the first generative adversarial network, the obtained segmentation results are used as label information, and the target domain images in the second set are not set with label information; the target domain images in the first set and the second set are respectively Input the second generative adversarial network, train the second generative adversarial network based on the adversarial learning, and obtain the trained second generative adversarial network, so as to determine the parameters of the second image segmentation model in the second generative adversarial network, wherein the second generative adversarial network
  • the second generator in the adversarial network is the second image segmentation model, the second image segmentation model has the same structure as the first image segmentation model, and the initial parameters of the second image segmentation model are assigned as the first generation in the trained adversarial
  • respectively inputting the source domain image and the target domain image into the first generative adversarial network includes: respectively inputting the source domain image and the target domain image into the feature extraction layer of the first image segmentation model, and respectively obtaining the first image of the source domain image.
  • a feature map and a second feature map of the target domain image respectively input the first feature map and the second feature map into the upsampling layer of the first image segmentation model to obtain the third feature map of the source domain image with the same size as the source domain image respectively.
  • the feature map and the fourth feature map of the target domain image with the same size as the target domain image input the third feature map and the fourth feature map into the softmax layer of the first image segmentation model, respectively, to obtain the segmentation result of the source domain image and the target image. Segmentation results of domain images.
  • the feature extraction layer of the first image segmentation model includes a plurality of layers. According to the sequence in which the source domain image or the target domain image passes in turn, the downsampling multiple of each feature extraction layer increases sequentially, and the last two feature extraction layers Connect the atrous convolution pyramid pooling modules respectively; when the downsampling multiple of one or more feature extraction layers in each feature extraction layer exceeds the threshold, use atrous convolution for one or more feature extraction layers to make one or more feature extraction layers.
  • the downsampling factor for multiple feature extraction layers is kept at a threshold.
  • inputting the source domain image and the target domain image into the feature extraction layer of the first image segmentation model respectively, and obtaining the first feature map of the source domain image and the second feature map of the target domain image respectively includes: The image and the target domain image are input into each feature extraction layer in turn; the features output by the last two feature extraction layers of the source domain image are input into the atrous convolution pyramid pooling module respectively, and the source domain images corresponding to the two output feature layers are obtained respectively.
  • the first multi-scale global feature and the second multi-scale global feature are obtained; the first multi-scale global feature and the second multi-scale global feature are fused to obtain the first feature map of the source domain image;
  • the features output by each feature extraction layer are input to the atrous convolution pyramid pooling module respectively, and the third multi-scale global feature and the fourth multi-scale global feature of the target domain image corresponding to the two output feature layers are obtained respectively;
  • the global feature and the fourth multi-scale global feature are fused to obtain the second feature map of the target domain image.
  • training the first generative adversarial network based on adversarial learning includes: in each training period, according to the difference between the segmentation result of the source domain image after passing through the first image segmentation model and the label information of the source domain image Adjust the parameters of an image segmentation model; re-adjust the first image segmentation model based on adversarial learning, and adjust the parameters of the first discriminator in the first generative adversarial network; repeat the above process until the first generation of Training of adversarial networks.
  • readjusting the first image segmentation model based on the adversarial learning includes: inputting the segmentation result of the target domain image through the first image segmentation model into the first discriminator, and performing domain category discrimination on the target domain image to obtain The discrimination result of the target domain image; the first confrontation loss function is determined according to the discrimination result of the target domain image; the first image segmentation model is adjusted again according to the first confrontation loss function.
  • adjusting the parameters of the first discriminator in the first generative adversarial network includes: dividing the source domain image after passing through the first image segmentation model and the target domain image after passing through the first image segmentation model.
  • the segmentation results are respectively input into the first discriminator, and the source domain image and the target domain image are respectively subjected to domain category discrimination, and the discrimination result of the source domain image and the discrimination result of the target domain image are obtained; according to the discrimination result of the source domain image and the target domain image According to the discrimination result, a first cross-entropy loss function is determined; the parameters of the first discriminator are adjusted according to the first cross-entropy loss function.
  • dividing the target domain images into the first set and the second set includes: inputting the target domain images into a first image segmentation model in the trained first generative adversarial network, obtaining a segmentation result of the target domain images, and Generate a pseudo-label of the target domain image, wherein the pseudo-label of the target domain image is used to mark each pixel in the target domain image as belonging to the target or background; according to the segmentation result of the target domain image and the pseudo-label of the target domain image, determine the target domain The score of the image, in which, the higher the probability value of the pixels belonging to the target in the target domain image corresponding to the target in the segmentation result, the less the number of pixels belonging to the target in the target domain image is marked by the pseudo-label of the target domain image, The higher the score of the target domain image; according to the score of each target domain image, select some target domain images to generate the first set, and the target domain images outside the first set generate the second set, wherein the target domain images in the first set are
  • the score in the target domain image is determined using the following formula:
  • training the second generative adversarial network based on adversarial learning includes: in each training period, according to the segmentation result of the target domain images in the first set after passing through the second image segmentation model and the target domain images in the first set Adjust the parameters of the second image segmentation model based on the difference of the labeling information; adjust the second image segmentation model again based on adversarial learning, and adjust the parameters of the second discriminator in the second generative adversarial network; repeat the above process until the training of the second generative adversarial network is completed.
  • readjusting the second image segmentation model based on the adversarial learning includes: inputting the segmentation result of the target domain images in the second set after passing through the second image segmentation model into the second discriminator, The domain image is subjected to domain category discrimination, and the discrimination result of the target domain image in the second set is obtained; according to the discrimination result of the target domain image in the second set, the second adversarial loss function is determined; according to the second adversarial loss function, the second image segmentation model Adjust again.
  • adjusting the parameters of the second discriminator in the second generative adversarial network includes: dividing the target domain images in the first set through the segmentation result of the second image segmentation model and the target domain images in the second set The segmentation results after passing through the second image segmentation model are input into the second discriminator respectively, and the domain category discrimination is performed on the target domain images in the first set and the second set respectively, and the discrimination results of the target domain images in the first set and the second set are obtained.
  • the discrimination result of the target domain image in the first set according to the discrimination result of the target domain image in the first set and the discrimination result of the target domain image in the second set, determine the second cross entropy loss function; According to the second cross entropy loss function, the second discrimination to adjust the parameters of the controller.
  • the source domain image and the target domain image are remote sensing satellite images including roads; the method further includes: tailoring the training sample image and the label mask image of the training sample image into a plurality of training samples according to preset sizes, respectively Image blocks and multiple label mask image blocks, wherein the training sample images include: remote sensing satellite images, the value of each pixel in the label mask image is 0 or 1, and 0 represents the same position as the pixel in the label mask image.
  • the pixels in the training sample image belong to the road, and 1 indicates that the pixels in the training sample image at the same position as the pixels in the label mask image belong to the background; the number of pixels whose value is 1 in the label mask image block is selected to exceed the preset number
  • the label mask image block and the corresponding training sample image block perform data enhancement on the training sample image block corresponding to the selected label mask image block, increase the number of training sample image blocks, and obtain the preprocessed training sample image block;
  • the preprocessed training sample image patches are divided into source domain images and target domain images.
  • the method further includes: inputting the to-be-segmented image into a second image segmentation model whose parameters are determined to obtain a segmentation result of the to-be-segmented image.
  • an apparatus for image segmentation including: a first training module, configured to input a source domain image and a target domain image into a first generative adversarial network respectively, and conduct the first generative adversarial network based on adversarial learning.
  • the network is trained to obtain the first generative adversarial network that has been trained, wherein the first generator in the first generative adversarial network is the first image segmentation model;
  • the division module is used to divide the target domain image into a first set and a second Sets, wherein, after the target domain images in the first set are trained through the first image segmentation model in the first generative adversarial network, the obtained segmentation results are used as label information, and the target domain images in the second set are not set with label information;
  • the second training module is used to input the target domain images in the first set and the second set respectively into the second generative adversarial network, train the second generative adversarial network based on the adversarial learning, and obtain the trained second generative adversarial network, thus Determine the parameters of the second image segmentation model in the second generative adversarial network, wherein the second generator in the second generative adversarial network is the second image segmentation model, the second image segmentation model has the same structure as the first image segmentation model, and the second The initial parameters
  • an image segmentation apparatus comprising: a processor; and a memory coupled to the processor for storing instructions, and when the instructions are executed by the processor, the processor executes any of the foregoing The image segmentation method of an embodiment.
  • a non-transitory computer-readable storage medium having a computer program stored thereon, wherein, when the program is executed by a processor, the image segmentation method of any of the foregoing embodiments is implemented.
  • FIG. 1 shows a schematic flowchart of an image segmentation method according to some embodiments of the present disclosure.
  • FIG. 2 shows a schematic structural diagram of a model of some embodiments of the present disclosure.
  • FIG. 3 shows a schematic flowchart of an image segmentation method according to other embodiments of the present disclosure.
  • FIG. 4 shows a schematic structural diagram of an image segmentation apparatus according to some embodiments of the present disclosure.
  • FIG. 5 shows a schematic structural diagram of an image segmentation apparatus according to other embodiments of the present disclosure.
  • FIG. 6 shows a schematic structural diagram of an image segmentation apparatus according to further embodiments of the present disclosure.
  • the gap between the source domain image and the target domain image is large, and the gap between each remote sensing image in the target domain will also be relatively large.
  • the model is directly trained by the inter-domain adaptive method, the trained model is only a rough preliminary model in the target domain, which cannot be accurately used for the subsequent road segmentation of remote sensing satellite images.
  • a technical problem to be solved by the present disclosure is: how to improve the accuracy of image segmentation.
  • the present disclosure proposes an image segmentation method, which is not only suitable for road segmentation scenarios of remote sensing satellite images, but also for scenes where other images are more complex, the image gap between the source domain and the target domain is large, and the gap between the images in the target domain is also relatively large. can be applied.
  • the image segmentation method of the present disclosure will be described below with reference to FIGS. 1 to 3 .
  • FIG. 1 is a flowchart of some embodiments of the disclosed image segmentation method. As shown in FIG. 1, the method of this embodiment includes steps S102-S106.
  • step S102 the source domain image and the target domain image are respectively input into the first generative adversarial network, and the first generative adversarial network is trained based on the adversarial learning to obtain the trained first generative adversarial network.
  • the source domain image and the target domain image are remote sensing satellite images including roads.
  • the entire network includes a first generative adversarial network and a second generative adversarial network
  • the first generative adversarial network includes a first generator and a first discriminator
  • the first generator is a (first) image segmentation model
  • the second generative adversarial network includes a second generator and a second discriminator
  • the second generator is a second image segmentation model
  • the (first) image segmentation model includes a feature extraction layer, an upsampling layer and a softmax layer.
  • an existing network model can be used as a backbone network for feature extraction as a feature extraction layer.
  • ResNet-101, ResNet-50, etc. there are no existing examples.
  • the upsampling layer is used to upsample the feature map (or feature tensor) output by the feature extraction layer and restore it to the same size as the input image.
  • the depth direction of the upsampled feature map has two channels. , after normalization by the softmax layer, each pixel corresponds to a binary vector, representing the probability value predicted as the background and the target (for example, a road), respectively, as the segmentation result.
  • the source domain image and the target domain image are respectively input into the feature extraction layer of the (first) image segmentation model to obtain the first feature of the source domain image respectively image and the second feature map of the target domain image; input the first feature map and the second feature map into the upsampling layer of the (first) image segmentation model respectively, and obtain the third feature map of the source domain image with the same size as the source domain image.
  • the feature map and the fourth feature map of the target domain image with the same size as the target domain image; the third feature map and the fourth feature map are respectively input into the softmax layer of the (first) image segmentation model, and the segmentation results of the source domain image are obtained respectively. and the segmentation result of the target domain image.
  • the segmentation result of the source domain image and the segmentation result of the target domain image can be input to the first discriminator for discrimination, which will be described in subsequent embodiments.
  • the present disclosure also proposes a method for improving the (first) image segmentation model.
  • the (first) feature extraction layer of the image segmentation model includes a plurality of layers, and the downsampling multiples of each feature extraction layer increase sequentially according to the sequence in which the source domain image or the target domain image passes in turn.
  • the last two feature extraction layers are respectively connected to the atrous convolution pyramid pooling module (Atrous Spatial Pyramid Pooling, ASPP).
  • ASPP Atrous Spatial Pyramid Pooling
  • the convolution pyramid pooling module obtains the first multi-scale global feature and the second multi-scale global feature of the source domain image corresponding to the two output feature layers respectively; the multi-scale global feature of the source domain image corresponding to the two feature layers respectively.
  • the feature that is, the first multi-scale global feature and the second multi-scale global feature are fused to obtain the first feature map of the source domain image.
  • the features output by the last two feature extraction layers of the target domain image are input into the atrous convolution pyramid pooling module respectively, and the third multi-scale global feature and the fourth multi-scale global feature of the target domain image corresponding to the two output feature layers are obtained.
  • Global features; the multi-scale global features of the target domain image corresponding to the two feature layers, namely the third multi-scale global feature and the fourth multi-scale global feature, are fused to obtain the second feature map of the target domain image.
  • the (first) image segmentation model uses ResNet-101 as the backbone network for feature extraction, and an image (source domain image or target domain image) obtains hierarchical features (C 1 , C 2 , C 3 , C through each feature extraction layer) 4 , C5 ). Since the last feature extraction layer adopts 32 times downsampling, road information will be lost, and maintaining large-scale features will increase the amount of computation and reduce the receptive field of the model. Therefore, atrous convolution can be used in the last feature extraction layer to keep the downsampling multiple at the threshold while expanding the receptive field of the model. The inventors have tested that the overall effect of the model is better when the threshold is 16. Further, atrous convolution pyramid pooling module is used on the last two level features C 4 , C 5 to obtain multi-scale global information will eventually After fusion, upsampling and normalization can get the segmentation result of the image.
  • a hole convolution pyramid pooling module is designed for road features in remote sensing road images to improve the robustness and effective representation of road information in feature extraction and upsampling, thereby improving the performance of remote sensing road segmentation. performance.
  • the atrous convolution pyramid pooling module fully considers the model performance and model complexity. By using atrous convolution, the deep features maintain a large resolution to ensure that road information is not lost, and then the road is further enhanced through multi-level feature fusion. Characterization ability, improve the identifiability of road information, and improve the prediction accuracy of the final segmentation result.
  • the (first) image segmentation model can output the segmentation result of each source domain image and the segmentation result of each target domain image, and the segmentation result includes, for example, each pixel point Probability of belonging to target and background. As shown in FIG.
  • the training of the first generative adversarial network may include: each training period (Epoch), according to the difference between the segmentation result after the source domain image passes through the (first) image segmentation model and the label information of the source domain image , adjust the parameters of the (first) image segmentation model; re-adjust the (first) image segmentation model based on adversarial learning, and adjust the parameters of the first discriminator in the first generative adversarial network; repeat the above process , until the training of the first generative adversarial network is completed.
  • each training period Epoch
  • the segmentation loss function of the source domain can be determined according to the difference between the segmentation result after the source domain image passes through the (first) image segmentation model and the labeling information of the source domain image, and the segmentation loss function based on the source domain is responsible for the (first) image segmentation model. parameters are adjusted.
  • the segmentation loss function of the source domain can adopt the cross-entropy loss function.
  • the annotated source domain image set is represented as (X s , Y s )
  • X s represents the source domain image data
  • Y s represents the annotation information
  • the unlabeled target domain image set is represented as X t .
  • the (first) image segmentation model is denoted G inter as the first generator.
  • the segmentation loss function of the source domain can be represented by the following formula.
  • (h, w) is the two-dimensional position of each pixel in the source domain image
  • c is the number of categories to be divided, that is, the number of channels
  • Label information for each pixel in each source domain image for example, 1 represents the target, 0 represents the background, and x s represents the data of each source domain image.
  • readjusting the (first) image segmentation model based on adversarial learning includes: inputting the segmentation result of the target domain image through the (first) image segmentation model into the first discriminator, Perform domain category discrimination on the target domain image to obtain the discrimination result of the target domain image; determine the first adversarial loss function according to the discrimination result of the target domain image; readjust the (first) image segmentation model according to the first adversarial loss function.
  • the first adversarial loss function can be determined by the following formula.
  • D inter ( ⁇ ) represents the first discriminator function
  • (h, w) is the two-dimensional position of each pixel in the target domain image.
  • the adjustment of the parameters of the first discriminator in the first generative adversarial network includes: the segmentation result of the source domain image after passing through the (first) image segmentation model and the target domain image
  • the segmentation results after passing through the (first) image segmentation model are respectively input into the first discriminator, and the domain category discrimination is performed on the source domain image and the target domain image respectively, and the discrimination result of the source domain image and the discrimination result of the target domain image are obtained;
  • the discrimination result of the domain image and the discrimination result of the target domain image determine the first cross-entropy loss function; the parameters of the first discriminator are adjusted according to the first cross-entropy loss function.
  • the first cross-entropy loss function can be determined using the following formula.
  • P s G inter (X s ) represents the segmentation result of the source domain image
  • P t G inter (X t ) represents the segmentation result of the target domain image
  • the continuous confrontation game in the feature space enables the model to generate similar feature distributions in the target domain and the source domain, thereby improving the robustness and generalization of the model.
  • step S104 the target domain images are divided into a first set and a second set.
  • the obtained segmentation result is used as label information, and the target domain images in the second set are not set Label information.
  • the probability that a target (eg, road) exists in the target domain image is considered.
  • the target domain image is input into the trained (first) image segmentation model in the first generative adversarial network, the segmentation result of the target domain image is obtained, and the pseudo-label of the target domain image is generated, wherein the target domain image
  • the pseudo-label of the target domain image is used to mark that each pixel in the target domain image belongs to the target or background; according to the segmentation result of the target domain image and the pseudo-label of the target domain image, the score of the target domain image is determined; according to the score of each target domain image, select Part of the target domain images generate a first set, and target domain images other than the first set generate a second set, wherein the pseudo-labels of the target domain images in the first set are used as annotation information of the target domain images.
  • the score in the target domain image is determined using the following formula.
  • formula (4) represents the probability that the pixel at (i, j) in the target domain image belongs to the target, represents the pseudo-label of the pixel at (i, j) in the target domain image, Indicates that the pixel at (i, j) in the target domain image belongs to the target, Indicates that the pixel at (i, j) in the target domain image belongs to the background, i and j are positive integers, and (i, j) represents the two-dimensional position of the pixel in the target domain image.
  • Each target domain image can be sorted according to the score from high to low, and the target domain images sorted before the preset number of digits are selected as the images in the first set, and the remaining target domain images are used as the images in the second set. For example, the images ranked in the top 70% are taken as the images in the first set, and the remaining 30% target domain images are taken as the images in the second set.
  • this method Based on the first set and the second set in the domain divided by the above method, this method focuses on scoring the target (road) information, and sorts and divides the target domain according to the score, fully considering the focused target information.
  • step S106 the target domain images in the first set and the second set are respectively input into the second generative adversarial network, and the second generative adversarial network is trained based on the adversarial learning, and the trained second generative adversarial network is obtained, thereby determining the first generative adversarial network. 2 Parameters of the second image segmentation model in the generative adversarial network.
  • the second generator in the second generative adversarial network is a (second) image segmentation model, and the initial parameters are assigned as parameters of the first generator in the first generative adversarial network after training.
  • the second generative adversarial network includes a second generator and a second discriminator.
  • the second generator is a (second) image segmentation model that has the same structure as the (first) image segmentation model of the first generative adversarial network.
  • the parameters of a (first) image segmentation model in a production adversarial network are initialized.
  • the second generative adversarial network can take a similar training method as the first generative adversarial network.
  • the (second) image segmentation model can output the segmentation results of the source domain images and the segmentation results of the target domain images.
  • the training of the second generative adversarial network may include: in each training period, according to the segmentation results of the target domain images in the first set after passing through the (second) image segmentation model and the target domain images in the first set Adjust the parameters of the (second) image segmentation model; re-adjust the (second) image segmentation model based on adversarial learning, and adjust the parameters of the second discriminator in the second generative adversarial network. Adjust; repeat the above process until the training of the second generative adversarial network is completed.
  • the segmentation loss function of the target domain can be determined according to the difference between the segmentation result after the target domain image in the first set passes through the (second) image segmentation model and the annotation information of the target domain image in the first set, and the segmentation loss function based on the target domain (2) The parameters of the image segmentation model are adjusted.
  • the segmentation loss function of the target domain can use the cross-entropy loss function.
  • the first set is represented as (X te , M te )
  • X te represents the target domain image data in the first set
  • M te represents annotation information
  • the second set without annotation is represented as X th .
  • the (second) image segmentation model is denoted G intra as a second generator.
  • the segmentation loss function of the target domain can be expressed by the following formula.
  • (h, w) is the two-dimensional position of each pixel in the target domain image
  • c is the number of categories to be divided, that is, the number of channels
  • x te represents the data of each target domain image in the first set.
  • readjusting the (second) image segmentation model based on adversarial learning includes: inputting the segmentation results of the target domain images in the second set after passing the (second) image segmentation model into the first image segmentation model.
  • the second discriminator performs domain category discrimination on the target domain images in the second set, and obtains the discrimination results of the target domain images in the second set; determines the second confrontation loss function according to the discrimination results of the target domain images in the second set;
  • the second adversarial loss function retunes the (second) image segmentation model.
  • the second adversarial loss function can be determined by the following formula.
  • D intra ( ⁇ ) represents the second discriminator function
  • (h, w) is the two-dimensional position of each pixel in the target domain image.
  • adjusting the parameters of the second discriminator in the second generative adversarial network includes: a segmentation result after passing the target domain images in the first set through the (second) image segmentation model
  • the segmentation results of the target domain images in the second set and the target domain images in the second set after passing through the (second) image segmentation model are respectively input into the second discriminator, and the target domain images in the first set and the second set are respectively subjected to domain category discrimination, and obtained in the first set.
  • the discrimination result of the target domain image and the discrimination result of the target domain image in the second set According to the discrimination result of the target domain image in the first set and the discrimination result of the target domain image in the second set, determine the second cross entropy loss function; According to The second cross-entropy loss function adjusts the parameters of the second discriminator.
  • the second cross-entropy loss function can be determined using the following formula.
  • Adjusting the parameters of the second generative adversarial network according to the above formulas (5)-(7) can obtain a (second) image segmentation model with better segmentation performance, and can generate more accurate pseudo-labels. Continuously repeat the adjustment of the parameters of the second generative adversarial network, and gradually improve the robustness and accuracy of the (second) image segmentation model on the target domain until the performance is saturated, and finally obtain the (second) image segmentation model we need.
  • G intra is used to perform segmentation tasks on target domain data.
  • step S108 the image to be segmented is input into a second image segmentation model that determines parameters to obtain a segmentation result of the image to be segmented, and then the target and background in the image to be segmented can be determined.
  • the source domain image and the target domain image are used to train the first generative adversarial network, and further according to the (first) image segmentation model in the first generative adversarial network that has been trained, the target domain image is divided into a first set and a A second set, and a second generative adversarial network is trained based on the first set and the second set, with the (second) image segmentation model as the second generator.
  • the solution of the present disclosure includes a two-stage unsupervised model training method combining the inter-domain domain adaptation between the source domain and the target domain, and the intra-domain domain adaptation in the target domain.
  • adversarial learning is used to achieve domain adaptation between domains, reducing the inter-domain differences between the source and target domains and the intra-domain differences existing in the target domain itself.
  • adversarial learning is used to gradually eliminate the differences in the domain, and gradually improve the robustness and generalization of the model on the target domain, improve the segmentation performance of the model, and thus improve the accuracy of image segmentation.
  • the width and height of the data are between 1000-5000.
  • Direct input of high-resolution images into the deep network may lead to excessive network load and insufficient hardware resources, and high-resolution images
  • Direct training will cause the network to fail to effectively learn the characteristics of the road.
  • the training sample images can be preprocessed, which will be described below with reference to FIG. 3 .
  • the preprocessing method can also be applied to the preprocessing of other images where the target and background distributions are very uneven and complex.
  • FIG. 3 is a flowchart of other embodiments of the disclosed image segmentation method. As shown in FIG. 3, the method of this embodiment includes steps S302-S308.
  • step S302 for each training sample image, the training sample image and the label mask image of the training sample image are cut into a plurality of training sample image blocks and a plurality of label mask image blocks according to a preset size, respectively.
  • the training sample images include: remote sensing satellite images, or other images with high resolution and a huge difference in the number of target and background pixels.
  • the value of each pixel in the label mask image is 0 or 1, 0 means that the pixel in the training sample image at the same position as the pixel in the label mask image belongs to the target (road), 1 means the pixel in the label mask image The pixels in the training sample images at the same position belong to the background.
  • step S304 the label mask image blocks and the corresponding training sample image blocks whose number of pixel points in the label mask image block whose value is 1 exceeds the preset number are selected.
  • step S306 data enhancement is performed on the training sample image blocks corresponding to the selected label mask image blocks, and the number of training sample image blocks is increased to obtain preprocessed training sample image blocks.
  • Some data augmentation strategies such as flipping and rotation strategies, can be used on the screened image patches to increase the sample size and diversity of the data to improve the generalization and robustness of the model.
  • step S308 the preprocessed training sample image blocks are divided into source domain images and target domain images.
  • the source domain image has annotation information, and the target domain image does not have annotation information.
  • the method of the above embodiment designs a set of strategies for cropping, screening and data enhancement of remote sensing road images, so as to improve the domain autonomy of remote sensing images and traditional computer vision. Compatibility of adaptation methods.
  • the present disclosure also provides an image segmentation device, which will be described below with reference to FIG. 4 .
  • FIG. 4 is a structural diagram of some embodiments of the disclosed image segmentation apparatus. As shown in FIG. 4 , the apparatus 40 of this embodiment includes: a first training module 410 , a division module 420 , and a second training module 430 .
  • the first training module 410 is used for inputting the source domain image and the target domain image into the first generative adversarial network respectively, and training the first generative adversarial network based on the adversarial learning to obtain the trained first generative adversarial network.
  • the first generator in the adversarial network is the (first) image segmentation model.
  • the first training module 410 is configured to input the source domain image and the target domain image into the feature extraction layer of the (first) image segmentation model, respectively, to obtain the first feature map of the source domain image and the target domain image respectively.
  • the second feature map; the first feature map and the second feature map are respectively input into the upsampling layer of the (first) image segmentation model, and the third feature map of the source domain image with the same size as the source domain image and the target domain image are obtained respectively.
  • the fourth feature map of the target domain image with the same image size input the third feature map and the fourth feature map into the softmax layer of the (first) image segmentation model, respectively, to obtain the segmentation result of the source domain image and the segmentation of the target domain image. result.
  • the (first) feature extraction layer of the image segmentation model includes a plurality of layers. According to the sequence in which the source domain image or the target domain image passes through, the downsampling multiple of each feature extraction layer increases sequentially, and the last two features
  • the extraction layers are respectively connected to the atrous convolution pyramid pooling module; when the downsampling multiple of one or more feature extraction layers in each feature extraction layer exceeds the threshold, atrous convolution is used for one or more feature extraction layers to make The downsampling factor of one or more feature extraction layers is kept at a threshold.
  • the first training module 410 is configured to sequentially input the source domain image and the target domain image into each feature extraction layer; input the features output by the last two feature extraction layers that the source domain image passes through into the atrous convolution pyramid pool respectively.
  • the first multi-scale global feature and the second multi-scale global feature of the source domain image corresponding to the two output feature layers are obtained; the first multi-scale global feature and the second multi-scale global feature are fused to obtain the source image.
  • the first feature map of the domain image; the features output by the last two feature extraction layers of the target domain image are respectively input into the atrous convolution pyramid pooling module, and the third most of the target domain images corresponding to the two output feature layers are obtained.
  • the scale global feature and the fourth multi-scale global feature; the third multi-scale global feature and the fourth multi-scale global feature are fused to obtain the second feature map of the target domain image.
  • the first training module 410 is used for each training period, according to the difference between the segmentation result of the source domain image after passing through the (first) image segmentation model and the label information of the source domain image, for the (first) image
  • the parameters of the segmentation model are adjusted; the (first) image segmentation model is re-adjusted based on the confrontation learning, and the parameters of the first discriminator in the first generation confrontation network are adjusted; the above process is repeated until the first generation is completed. Training of adversarial networks.
  • the first training module 410 is configured to input the segmentation result of the target domain image through the (first) image segmentation model into the first discriminator, and perform domain category discrimination on the target domain image to obtain the target domain image discrimination Results: According to the discrimination result of the target domain image, the first confrontation loss function is determined; the (first) image segmentation model is adjusted again according to the first confrontation loss function.
  • the first training module 410 is configured to input the segmentation result of the source domain image through the (first) image segmentation model and the segmentation result of the target domain image through the (first) image segmentation model into the first discrimination model respectively The first intersection is determined according to the discrimination result of the source domain image and the discrimination result of the target domain image according to the discrimination result of the source domain image and the discrimination result of the target domain image.
  • Entropy loss function adjust the parameters of the first discriminator according to the first cross-entropy loss function.
  • the dividing module 420 is used to divide the target domain images into a first set and a second set, wherein, after the target domain images in the first set are trained through the (first) image segmentation model in the first generative adversarial network, the obtained The segmentation result is used as label information, and the target domain images in the second set do not have label information.
  • the division module 420 is configured to input the target domain image into the trained (first) image segmentation model in the first generative adversarial network, obtain the segmentation result of the target domain image, and generate a pseudo-label of the target domain image, Among them, the pseudo-label of the target domain image is used to mark that each pixel in the target domain image belongs to the target or the background; according to the segmentation result of the target domain image and the pseudo-label of the target domain image, the score of the target domain image is determined, wherein the target domain image is scored.
  • the score in the target domain image is determined using the following formula:
  • the second training module 430 is configured to input the target domain images in the first set and the second set respectively into the second generative adversarial network, train the second generative adversarial network based on the adversarial learning, and obtain the trained second generative adversarial network, Thereby, the parameters of the (second) image segmentation model in the second generative adversarial network are determined, wherein the second generator in the second generative adversarial network is the (second) image segmentation model, the second image segmentation model and the first image segmentation model The structure is the same, and the initial parameters of the second image segmentation model are assigned as parameters of the first generator in the trained first generative adversarial network.
  • the second training module 430 is used for each training period, according to the segmentation results of the target domain images in the first set after passing through the (second) image segmentation model and the labeling information of the target domain images in the first set. difference, adjust the parameters of the (second) image segmentation model; re-adjust the (second) image segmentation model based on adversarial learning, and adjust the parameters of the second discriminator in the second generative adversarial network; repeat the above process until the training of the second generative adversarial network is completed.
  • the second training module 430 is configured to input the segmentation result of the target domain images in the second set after passing through the (second) image segmentation model into the second discriminator, and perform domain classification on the target domain images in the second set Discriminate to obtain the discrimination result of the target domain image in the second set; determine the second confrontation loss function according to the discrimination result of the target domain image in the second set; adjust the (second) image segmentation model again according to the second confrontation loss function .
  • the second training module 430 is configured to combine the segmentation results of the target domain images in the first set after passing through the (second) image segmentation model and the target domain images in the second set after passing the (second) image segmentation model
  • the segmentation results are respectively input into the second discriminator, and the target domain images in the first set and the second set are respectively subjected to domain category discrimination, and the discrimination results of the target domain images in the first set and the target domain images in the second set are obtained.
  • the source domain image and the target domain image are remote sensing satellite images including roads; the apparatus 40 further includes: a preprocessing module 440, configured to perform a preset on the training sample images and the label mask images of the training sample images The size is cut into multiple training sample image blocks and multiple label mask image blocks, wherein the training sample images include: remote sensing satellite images, the value of each pixel in the label mask image is 0 or 1, and 0 represents the label mask The pixels in the training sample images with the same pixel positions in the image belong to the road, and 1 means that the pixels in the training sample images with the same pixel positions in the label mask image belong to the background; select the value of the pixels in the label mask image block to be 1 The number of label mask image blocks and the corresponding training sample image blocks exceeds the preset number; the training sample image blocks corresponding to the selected label mask image blocks are subjected to data enhancement, and the number of training sample image blocks is increased.
  • the training sample image blocks after preprocessing are divided into source domain images and target domain images.
  • the apparatus 40 further includes: an image segmentation module 450, configured to input the image to be segmented into the (second) image segmentation model of the determined parameters, and obtain a segmentation result of the image to be segmented.
  • an image segmentation module 450 configured to input the image to be segmented into the (second) image segmentation model of the determined parameters, and obtain a segmentation result of the image to be segmented.
  • the image segmentation apparatuses in the embodiments of the present disclosure may be implemented by various computing devices or computer systems, which will be described below with reference to FIG. 5 and FIG. 6 .
  • FIG. 5 is a structural diagram of some embodiments of the disclosed image segmentation apparatus.
  • the apparatus 50 of this embodiment includes a memory 510 and a processor 520 coupled to the memory 510, the processor 520 being configured to execute any of the implementations of the present disclosure based on instructions stored in the memory 510 The image segmentation method in the example.
  • the memory 510 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
  • the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs.
  • FIG. 6 is a structural diagram of other embodiments of the disclosed image segmentation apparatus.
  • the apparatus 60 of this embodiment includes: a memory 610 and a processor 620 , which are similar to the memory 510 and the processor 520 respectively. It may also include an input-output interface 630, a network interface 640, a storage interface 650, and the like. These interfaces 630 , 640 , 650 and the memory 610 and the processor 620 can be connected, for example, through a bus 660 .
  • the input and output interface 630 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen.
  • the network interface 640 provides a connection interface for various networked devices, for example, it can be connected to a database server or a cloud storage server.
  • the storage interface 650 provides a connection interface for external storage devices such as SD cards and U disks.
  • embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein .
  • computer-usable non-transitory storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
  • These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps configured to implement the functions specified in a flow or flows of the flowcharts and/or a block or blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本公开涉及一种图像分割方法、装置和计算机可读存储介质,涉及计算机技术领域。本公开的方法包括:将源域图像和目标域图像分别输入第一生成对抗网络,基于对抗学习对第一生成对抗网络进行训练,得到训练完成的第一生成对抗网络,其中,第一生成对抗网络中第一生成器为图像分割模型;将目标域图像划分为第一集合和第二集合,其中,第一集合中的目标域图像通过训练完成的第一生成对抗网络中图像分割模型后,得到的分割结果作为标注信息,第二集合中目标域图像不设置标注信息;将第一集合和第二集合中的目标域图像分别输入第二生成对抗网络,基于对抗学习对第二生成对抗网络进行训练,得到训练完成的第二生成对抗网络,从而确定图像分割模型的参数。

Description

解码方法、装置和计算机可读存储介质
相关申请的交叉引用
本申请是以CN申请号为202110325191.9,申请日为2021年3月26日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及计算机技术领域,特别涉及一种图像分割方法、装置和计算机可读存储介质。
背景技术
对遥感道路图像的分割旨在将道路信息从复杂的高分辨率遥感图像中分离出来,是一项十分具有挑战性的任务。它是遥感领域中重要的研究方向,在日常生活中也有着重要的应用,比如车辆导航,地图信息更新,城市规划和灾害救援等。
发明人已知的一种方法是采用领域自适应方法进行图像的道路分割。但是大部分领域自适应方法都是在传统的自然场景图像进行研究,且是从合成图像往真实图像上进行领域间自适应。
发明内容
根据本公开的一些实施例,提供的一种图像分割方法,包括:将源域图像和目标域图像分别输入第一生成对抗网络,基于对抗学习对第一生成对抗网络进行训练,得到训练完成的第一生成对抗网络,其中,第一生成对抗网络中第一生成器为第一图像分割模型;将目标域图像划分为第一集合和第二集合,其中,第一集合中的目标域图像通过训练完成的第一生成对抗网络中第一图像分割模型后,得到的分割结果作为标注信息,第二集合中目标域图像不设置标注信息;将第一集合和第二集合中的目标域图像分别输入第二生成对抗网络,基于对抗学习对第二生成对抗网络进行训练,得到训练完成的第二生成对抗网络,从而确定第二生成对抗网络中第二图像分割模型的参数,其中,第二生成对抗网络中第二生成器为第二图像分割模型,第二图像分割模型与第一图像分割模型结构相同,第二图像分割模型的初始参数被赋值为训练完成的第一生成对抗网络中第一生成器的参数。
在一些实施例中,将源域图像和目标域图像分别输入第一生成对抗网络包括:将源域图像和目标域图像分别输入第一图像分割模型的特征提取层,分别得到源域图像的第一特征图和目标域图像的第二特征图;将第一特征图和第二特征图分别输入第一图像分割模型的上采样层,分别得到与源域图像尺寸相同的源域图像的第三特征图和与目标域图像尺寸相同的目标域图像的第四特征图;将第三特征图和第四特征图分别输入第一图像分割模型的softmax层,分别得到源域图像的分割结果和目标域图像的分割结果。
在一些实施例中,第一图像分割模型的特征提取层包括多个,按照源域图像或目标域图像依次经过的顺序,各个特征提取层的下采样倍数依次增大,最后两个特征提取层分别连接空洞卷积金字塔池化模块;在各个特征提取层中存在一个或多个特征提取层的下采样倍数超过阈值的情况下,对一个或多个特征提取层使用空洞卷积,使一个或多个特征提取层的下采样倍数保持在阈值。
在一些实施例中,将源域图像和目标域图像分别输入第一图像分割模型的特征提取层,分别得到源域图像的第一特征图和目标域图像的第二特征图包括:将源域图像和目标域图像依次输入各个特征提取层;将源域图像最后经过的两个特征提取层输出的特征分别输入空洞卷积金字塔池化模块,得到输出的两个特征层分别对应的源域图像的第一多尺度全局特征和第二多尺度全局特征;将第一多尺度全局特征和第二多尺度全局特征进行融合,得到源域图像的第一特征图;将目标域图像最后经过的两个特征提取层输出的特征分别输入空洞卷积金字塔池化模块,得到输出的两个特征层分别对应的目标域图像的第三多尺度全局特征和第四多尺度全局特征;将第三多尺度全局特征和第四多尺度全局特征进行融合,得到目标域图像的第二特征图。
在一些实施例中,基于对抗学习对第一生成对抗网络进行训练包括:每个训练时期,根据源域图像通过第一图像分割模型后的分割结果和源域图像的标注信息的差异,对第一图像分割模型的参数进行调整;基于对抗学习对第一图像分割模型进行再次调整,并对第一生成对抗网络中的第一判别器的参数进行调整;重复上述过程,直至完成对第一生成对抗网络的训练。
在一些实施例中,基于对抗学习对第一图像分割模型进行再次调整包括:将目标域图像通过第一图像分割模型后的分割结果输入第一判别器,对目标域图像进行域类别判别,得到目标域图像的判别结果;根据目标域图像的判别结果,确定第一对抗损失函数;根据第一对抗损失函数对第一图像分割模型进行再次调整。
在一些实施例中,对第一生成对抗网络中的第一判别器的参数进行调整包括:将源域图像通过第一图像分割模型后的分割结果和目标域图像通过第一图像分割模型后的分割结果分别输入第一判别器,分别对源域图像和目标域图像进行域类别判别,得到源域图像的判别结果和目标域图像的判别结果;根据源域图像的判别结果和目标域图像的判别结果,确定第一交叉熵损失函数;根据第一交叉熵损失函数对第一判别器的参数进行调整。
在一些实施例中,将目标域图像划分为第一集合和第二集合包括:将目标域图像输入训练完成的第一生成对抗网络中第一图像分割模型,得到目标域图像的分割结果,并生成目标域图像的伪标签,其中,目标域图像的伪标签用于标注目标域图像中每个像素点属于目标或背景;根据目标域图像的分割结果和目标域图像的伪标签,确定目标域图像的评分,其中,目标域图像中属于目标的像素点在分割结果中对应的属于目标的概率值越高,目标域图像的伪标签标注目标域图像中属于目标的像素点的数量越少,目标域图像的评分越高;根据各个目标域图像的评分,选取部分目标域图像生成第一集合,第一集合之外的目标域图像生成第二集合,其中,第一集合中的目标域图像的伪标签作为该目标域图像的标注信息。
在一些实施例中,目标域图像中的评分采用以下公式确定:
Figure PCTCN2022071371-appb-000001
其中,
Figure PCTCN2022071371-appb-000002
表示目标域图像中(i,j)处像素属于目标的概率,
Figure PCTCN2022071371-appb-000003
表示目标域图像中(i,j)处像素的伪标签,
Figure PCTCN2022071371-appb-000004
表示目标域图像中(i,j)处像素属于目标,
Figure PCTCN2022071371-appb-000005
表示目标域图像中(i,j)处像素属于背景,i和j为正整数,(i,j)表示像素在目标域图像中二维位置。
在一些实施例中,基于对抗学习对第二生成对抗网络进行训练包括:每个训练时期,根据第一集合中目标域图像通过第二图像分割模型后的分割结果和第一集合中目标域图像的标注信息的差异,对第二图像分割模型的参数进行调整;基于对抗学习对第二图像分割模型进行再次调整,并对第二生成对抗网络中的第二判别器的参数进行调整;重复上述过程,直至完成对第二生成对抗网络的训练。
在一些实施例中,基于对抗学习对第二图像分割模型进行再次调整包括:将第二集合中目标域图像通过第二图像分割模型后的分割结果输入第二判别器,对第二集合中目标域图像进行域类别判别,得到第二集合中目标域图像的判别结果;根据第二集 合中目标域图像的判别结果,确定第二对抗损失函数;根据第二对抗损失函数对第二图像分割模型进行再次调整。
在一些实施例中,对第二生成对抗网络中的第二判别器的参数进行调整包括:将第一集合中目标域图像通过第二图像分割模型后的分割结果和第二集合中目标域图像通过第二图像分割模型后的分割结果分别输入第二判别器,分别对第一集合和第二集合中目标域图像进行域类别判别,得到第一集合中目标域图像的判别结果和第二集合中目标域图像的判别结果;根据第一集合中目标域图像的判别结果和第二集合中目标域图像的判别结果,确定第二交叉熵损失函数;根据第二交叉熵损失函数对第二判别器的参数进行调整。
在一些实施例中,源域图像和目标域图像为包括道路的遥感卫星图像;该方法还包括:对训练样本图像和训练样本图像的标签掩膜图像按照预设尺寸分别剪裁成多个训练样本图像块和多个标签掩膜图像块,其中,训练样本图像包括:遥感卫星图像,标签掩膜图像中各个像素点的值为0或1,0表示与标签掩膜图像中像素点相同位置的训练样本图像中像素点属于道路,1表示与标签掩膜图像中像素点相同位置的训练样本图像中像素点属于背景;选取标签掩膜图像块中像素点的值为1的数量超过预设数量的标签掩膜图像块和对应的训练样本图像块;将选取的标签掩膜图像块对应的训练样本图像块进行数据增强,增加训练样本图像块的数量,得到预处理后的训练样本图像块;将预处理后的训练样本图像块划分为源域图像和目标域图像。
在一些实施例中,该方法还包括:将待分割图像输入确定参数的第二图像分割模型中,得到待分割图像的分割结果。
根据本公开的另一些实施例,提供的一种图像分割装置,包括:第一训练模块,用于将源域图像和目标域图像分别输入第一生成对抗网络,基于对抗学习对第一生成对抗网络进行训练,得到训练完成的第一生成对抗网络,其中,第一生成对抗网络中第一生成器为第一图像分割模型;划分模块,用于将目标域图像划分为第一集合和第二集合,其中,第一集合中的目标域图像通过训练完成的第一生成对抗网络中第一图像分割模型后,得到的分割结果作为标注信息,第二集合中目标域图像不设置标注信息;第二训练模块,用于将第一集合和第二集合中的目标域图像分别输入第二生成对抗网络,基于对抗学习对第二生成对抗网络进行训练,得到训练完成的第二生成对抗网络,从而确定第二生成对抗网络中第二图像分割模型的参数,其中,第二生成对抗网络中第二生成器为第二图像分割模型,第二图像分割模型与第一图像分割模型结构 相同,第二图像分割模型的初始参数被赋值为训练完成的第一生成对抗网络中第一生成器的参数。
根据本公开的又一些实施例,提供的一种图像分割装置,包括:处理器;以及耦接至处理器的存储器,用于存储指令,指令被处理器执行时,使处理器执行如前述任意实施例的图像分割方法。
根据本公开的再一些实施例,提供的一种非瞬时性计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现前述任意实施例的图像分割方法。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。
附图说明
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示意性实施例及其说明被配置为解释本公开,并不构成对本公开的不当限定。
图1示出本公开的一些实施例的图像分割方法的流程示意图。
图2示出本公开的一些实施例的模型的结构示意图。
图3示出本公开的另一些实施例的图像分割方法的流程示意图。
图4示出本公开的一些实施例的图像分割装置的结构示意图。
图5示出本公开的另一些实施例的图像分割装置的结构示意图。
图6示出本公开的又一些实施例的图像分割装置的结构示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
发明人发现:由于高分辨率遥感卫星图像具有复杂的环境背景以及极端不均衡的背景像素和道路像素样本数量比例。源域图像和目标域图像的差距较大,并且目标域内各个遥感图像之间的差距也会比较大。如果直接采用领域间自适应方法对模型进行训练,训练后的模型在目标域只是一个较粗糙的初步模型,无法准确用于后续的遥感 卫星图像的道路分割。
本公开所要解决的一个技术问题是:如何提高图像分割的准确性。
本公开提出一种图像分割方法,不仅适用于遥感卫星图像的道路分割场景,针对其他图像较为复杂,源域和目标域图像差距较大,目标域内各个图像之间差距也比较大的场景,也可以适用。下面结合图1~3描述本公开的图像分割方法。
图1为本公开图像分割方法一些实施例的流程图。如图1所示,该实施例的方法包括:步骤S102~S106。
在步骤S102中,将源域图像和目标域图像分别输入第一生成对抗网络,基于对抗学习对第一生成对抗网络进行训练,得到训练完成的第一生成对抗网络。
例如,在遥感卫星图像的道路分割场景中,源域图像和目标域图像为包括道路的遥感卫星图像。如图2所示,整个网络中包括第一生成对抗网络和第二生成对抗网络,第一生成对抗网络包括第一生成器和第一判别器,第一生成器为(第一)图像分割模型,第二生成对抗网络包括第二生成器和第二判别器,第二生成器为第二图像分割模型,与第一生成对抗网络的第一图像分割模型结构完全相同。在一些实施例中,(第一)图像分割模型包括特征提取层、上采样层和softmax层。例如可以采用现有的网络模型作为骨干网络进行特征提取,作为特征提取层。例如,ResNet-101、ResNet-50等,不现有所举示例。上采样层用于将特征提取层输出的特征图(Feature Map)(或特征张量)进行上采样,恢复为与输入的图像相同的尺寸,上采样后的特征图的深度方向拥有两个通道,经过softmax层归一化后,每一个像素点位对应一个二元向量,分别表示预测为背景和目标(例如,道路)的概率值,作为分割结果。
在一些实施例中,针对每个源域图像和每个目标域图像,将源域图像和目标域图像分别输入(第一)图像分割模型的特征提取层,分别得到源域图像的第一特征图和目标域图像的第二特征图;将第一特征图和第二特征图分别输入(第一)图像分割模型的上采样层,分别得到与源域图像尺寸相同的源域图像的第三特征图和与目标域图像尺寸相同的目标域图像的第四特征图;将第三特征图和第四特征图分别输入(第一)图像分割模型的softmax层,分别得到源域图像的分割结果和目标域图像的分割结果。之后源域图像的分割结果和目标域图像的分割结果可以被输入第一判别器进行判别,后续实施例将描述。
针对遥感卫星图像道路分割的场景,由于道路细长容易造成信息丢失的问题,本公开还提出一种对(第一)图像分割模型进行改进的方法。
在一些实施例中,如图2所示,(第一)图像分割模型的特征提取层包括多个,按照源域图像或目标域图像依次经过的顺序,各个特征提取层的下采样倍数依次增大,最后两个特征提取层分别连接空洞卷积金字塔池化模块(Atrous Spatial Pyramid Pooling,ASPP),在各个特征提取层中存在一个或多个特征提取层的下采样倍数超过阈值的情况下,对一个或多个特征提取层使用空洞卷积,使一个或多个特征提取层的下采样倍数保持在阈值。
进一步,例如,针对每个源域图像和每个目标域图像,将源域图像和目标域图像依次输入各个特征提取层;将源域图像最后经过的两个特征提取层输出的特征分别输入空洞卷积金字塔池化模块,得到输出的两个特征层分别对应的源域图像的第一多尺度全局特征和第二多尺度全局特征;将两个特征层分别对应的源域图像的多尺度全局特征即第一多尺度全局特征和第二多尺度全局特征进行融合,得到源域图像的第一特征图。将目标域图像最后经过的两个特征提取层输出的特征分别输入空洞卷积金字塔池化模块,得到输出的两个特征层分别对应的目标域图像的第三多尺度全局特征和第四多尺度全局特征;将两个特征层分别对应的目标域图像的多尺度全局特征即第三多尺度全局特征和第四多尺度全局特征进行融合,得到目标域图像的第二特征图。
例如,(第一)图像分割模型采用ResNet-101作为骨干网络进行特征提取,一个图像(源域图像或目标域图像)经过各个特征提取层获得层级特征(C 1,C 2,C 3,C 4,C 5)。由于最后一个特征提取层采用32倍下采样,会导致道路信息丢失,而保持大尺度特征又会增加计算量以及降低模型的感受野。因此,可以在最后一个特征提取层使用空洞卷积使得下采样倍数保持在阈值的同时扩大模型的感受野。发明人经过试验阈值为16时模型整体的效果较好。进一步,在最后两个层级特征C 4,C 5上使用空洞卷积金字塔池化模块来获取多尺度全局信息
Figure PCTCN2022071371-appb-000006
最终将
Figure PCTCN2022071371-appb-000007
融合后上采样并进行归一化可以得到图像的分割结果。
上述实施例的方法,针对遥感道路图像中的道路特征,设计了一个空洞卷积金字塔池化模块,提升道路信息在特征提取和上采样中的鲁棒性和有效表征,从而提升遥感道路分割的性能。空洞卷积金字塔池化模块充分考虑了模型性能和模型复杂度,通过使用空洞卷积使得深层特征保持较大的分辨率确保道路信息不丢失,之后再通过多层级的特征融合进一步显式增强道路表征能力,提升道路信息的可辨识性,提高最终的分割结果预测准确度。
源域图像和目标域图像输入第一生成对抗网络后,(第一)图像分割模型可以输 出每个源域图像的分割结果和每个目标域图像的分割结果,分割结果例如包括每个像素点属于目标和背景的概率。如图2所示,对第一生成对抗网络进行训练可以包括:每个训练时期(Epoch),根据源域图像通过(第一)图像分割模型后的分割结果和源域图像的标注信息的差异,对(第一)图像分割模型的参数进行调整;基于对抗学习对(第一)图像分割模型进行再次调整,并对第一生成对抗网络中的第一判别器的参数进行调整;重复上述过程,直至完成对第一生成对抗网络的训练。
可以根据源域图像通过(第一)图像分割模型后的分割结果和源域图像的标注信息的差异确定源域的分割损失函数,基于源域的分割损失函数对(第一)图像分割模型的参数进行调整。源域的分割损失函数可以采用交叉熵损失函数。例如,带标注的源域图像集合表示为(X s,Y s),X s表示源域图像数据,Y s表示标注信息,无标注的目标域图像集合表示为X t。(第一)图像分割模型作为第一生成器表示为G inter。源域的分割损失函数可以采用以下公式表示。
Figure PCTCN2022071371-appb-000008
公式(1)中,(h,w)为各个像素点在源域图像中的二维位置,c为分割的类别数即通道数,
Figure PCTCN2022071371-appb-000009
为每个源域图像中每个像素点的标注信息,例如,1表示目标,0表示背景,x s表示每个源域图像的数据。
在一些实施例中,如图2所示,基于对抗学习对(第一)图像分割模型进行再次调整包括:将目标域图像通过(第一)图像分割模型后的分割结果输入第一判别器,对目标域图像进行域类别判别,得到目标域图像的判别结果;根据目标域图像的判别结果,确定第一对抗损失函数;根据第一对抗损失函数对(第一)图像分割模型进行再次调整。第一对抗损失函数可以采用以下公式确定。
Figure PCTCN2022071371-appb-000010
公式(2)中D inter(·)表示第一判别器函数,(h,w)为各个像素点在目标域图像中的二维位置。
在一些实施例中,如图2所示,对第一生成对抗网络中的第一判别器的参数进行调整包括:将源域图像通过(第一)图像分割模型后的分割结果和目标域图像通过(第一)图像分割模型后的分割结果分别输入第一判别器,分别对源域图像和目标域图像进行域类别判别,得到源域图像的判别结果和目标域图像的判别结果;根据源域图像的判别结果和目标域图像的判别结果,确定第一交叉熵损失函数;根据第一交叉熵损失函数对第一判别器的参数进行调整。第一交叉熵损失函数可以采用以下公式确定。
Figure PCTCN2022071371-appb-000011
公式(3)中P s=G inter(X s)表示源域图像的分割结果,P t=G inter(X t)表示目标域图像的分割结果。
上述实施例通过对抗学习的思想,在特征空间上的不断对抗博弈,使得模型能够在目标域和源域上生成相似的特征分布,提高模型的鲁棒性和泛化性。
在步骤S104中,将目标域图像划分为第一集合和第二集合。
在一些实施例中,第一集合中的目标域图像通过训练完成的第一生成对抗网络中(第一)图像分割模型后,得到的分割结果作为标注信息,第二集合中目标域图像不设置标注信息。为了进一步提高模型训练的准确度,在划分第一集合和第二集合时,考虑目标域图像中存在目标(例如,道路)的概率。
在一些实施例中,将目标域图像输入训练完成的第一生成对抗网络中(第一)图像分割模型,得到目标域图像的分割结果,并生成目标域图像的伪标签,其中,目标域图像的伪标签用于标注目标域图像中每个像素点属于目标或背景;根据目标域图像的分割结果和目标域图像的伪标签,确定目标域图像的评分;根据各个目标域图像的评分,选取部分目标域图像生成第一集合,第一集合之外的目标域图像生成第二集合,其中,第一集合中的目标域图像的伪标签作为该目标域图像的标注信息。目标域图像中属于目标的像素点在分割结果中对应的属于目标的概率值越高,目标域图像的伪标签标注目标域图像中属于目标的像素点的数量越少,目标域图像的评分越高。例如,目标域图像中的评分采用以下公式确定。
Figure PCTCN2022071371-appb-000012
公式(4)中,
Figure PCTCN2022071371-appb-000013
表示目标域图像中(i,j)处像素属于目标的概率,
Figure PCTCN2022071371-appb-000014
表示目标域图像中(i,j)处像素的伪标签,
Figure PCTCN2022071371-appb-000015
表示目标域图像中(i,j)处像素属于目标,
Figure PCTCN2022071371-appb-000016
表示目标域图像中(i,j)处像素属于背景,i和j为正整数,(i,j)表示像素在所述目标域图像中二维位置。
可以将各个目标域图像按照得分由高到低进行排序,选取排序在预设位数之前的目标域图像作为第一集合中的图像,剩余的目标域图像作为第二集合中的图像。例如,将排序在前70%的图像作为第一集合中的图像,剩余的30%目标域图像作为第二集合中的图像。
基于上述方法划分的域内第一集合和第二集合,该方式重点针对目标(道路)信息进行评分,并根据得分对目标域进行排序和划分,充分考虑了重点关注的目标信息。
在步骤S106中,第一集合和第二集合中的目标域图像分别输入第二生成对抗网络,基于对抗学习对第二生成对抗网络进行训练,得到训练完成的第二生成对抗网络,从而确定第二生成对抗网络中第二图像分割模型的参数。
如图2所示,第二生成对抗网络中第二生成器为(第二)图像分割模型,初始参数被赋值为训练完成的第一生成对抗网络中第一生成器的参数。第二生成对抗网络包括第二生成器和第二判别器,第二生成器为(第二)图像分割模型与第一生成对抗网络的(第一)图像分割模型结构相同,由训练完成的第一生产对抗网络中的(第一)图像分割模型的参数进行初始化。第二生成对抗网络可以采取与第一生成对抗网络相似的训练方法。
第一集合中目标域图像和第二集合中的目标域图像输入第二生成对抗网络后,(第二)图像分割模型可以输出源域图像的分割结果和目标域图像的分割结果。如图2所示,对第二生成对抗网络进行训练可以包括:每个训练时期,根据第一集合中目标域图像通过(第二)图像分割模型后的分割结果和第一集合中目标域图像的标注信息的差异,对(第二)图像分割模型的参数进行调整;基于对抗学习对(第二)图像分割模型进行再次调整,并对第二生成对抗网络中的第二判别器的参数进行调整;重复上述过程,直至完成对第二生成对抗网络的训练。
可以根据第一集合中目标域图通过(第二)图像分割模型后的分割结果和第一集合中目标域图像的标注信息的差异确定目标域的分割损失函数,基于目标域的分割损失函数对(第二)图像分割模型的参数进行调整。目标域的分割损失函数可以采用交叉熵损失函数。例如,第一集合表示为(X te,M te),X te表示第一集合中目标域图像数据,M te表示标注信息,无标注的第二集合表示为X th。(第二)图像分割模型作为第二生成器表示为G intra。目标域的分割损失函数可以采用以下公式表示。
Figure PCTCN2022071371-appb-000017
公式(5)中,(h,w)为各个像素点在目标域图像中的二维位置,c为分割的类别数即通道数,
Figure PCTCN2022071371-appb-000018
为第一集合中每个目标域图像中每个像素点的标注信息,例如,1表示目标,0表示背景,x te表示第一集合每个目标域图像的数据。
在一些实施例中,如图2所示,基于对抗学习对(第二)图像分割模型进行再次调整包括:将第二集合中目标域图像通过(第二)图像分割模型后的分割结果输入第二判别器,对第二集合中目标域图像进行域类别判别,得到第二集合中目标域图像的判别结果;根据第二集合中目标域图像的判别结果,确定第二对抗损失函数;根据第 二对抗损失函数对(第二)图像分割模型进行再次调整。第二对抗损失函数可以采用以下公式确定。
Figure PCTCN2022071371-appb-000019
公式(6)中D intra(·)表示第二判别器函数,(h,w)为各个像素点在目标域图像中的二维位置。
在一些实施例中,如图2所示,对第二生成对抗网络中的第二判别器的参数进行调整包括:将第一集合中目标域图像通过(第二)图像分割模型后的分割结果和第二集合中目标域图像通过(第二)图像分割模型后的分割结果分别输入第二判别器,分别对第一集合和第二集合中目标域图像进行域类别判别,得到第一集合中目标域图像的判别结果和第二集合中目标域图像的判别结果;根据第一集合中目标域图像的判别结果和第二集合中目标域图像的判别结果,确定第二交叉熵损失函数;根据第二交叉熵损失函数对第二判别器的参数进行调整。第二交叉熵损失函数可以采用以下公式确定。
Figure PCTCN2022071371-appb-000020
公式(7)中P te=G intra(X te)表示第一集合中目标域图像的分割结果,P th=G intra(X th)表示第二集合中目标域图像的分割结果。
根据上述公式(5)-(7)进行第二生成对抗网络的参数的调整即域内领域自适应,可以获得分割性能更好的(第二)图像分割模型,可以生成更加准确的伪标签,通过不断重复对第二生成对抗网络的参数的调整,逐步提升(第二)图像分割模型在目标域上的鲁棒性和精度,直至性能饱和,最终获得我们所需的(第二)图像分割模型G intra用以在目标域数据上进行分割任务。
可选的,在步骤S108中,将待分割图像输入确定参数的第二图像分割模型中,得到待分割图像的分割结果,进而可以确定待分割图像中目标和背景。
上述实施例中利用源域图像和目标域图像对第一生成对抗网络进行训练,进一步根据训练完成的第一生成对抗网络中(第一)图像分割模型,将目标域图像划分为第一集合和第二集合,并基于第一集合和第二集合对第二生成对抗网络进行训练,其中,(第二)图像分割模型作为第二生成器。本公开的方案包括源域和目标域之间的域间领域自适应,以及目标域内的域内领域自适应相结合的二阶段无监督模型训练方法。在第一阶段采用对抗学习实现域间的领域自适应,减少源域和目标域之间的域间差异和目标域本身存在的域内差异。第二阶段采用对抗学习逐步消除域内差异,在目标域 上渐进式的提升模型的鲁棒性和泛化性,提高模型的分割性能,从而提高图像分割的准确性。
由于遥感图像的尺寸不统一,数据的宽和高的尺寸在1000-5000之间,高分辨率的图像直接输入深度网络中可能会导致网络负载过高而导致硬件资源不足,且高分辨率图像中背景像素和目标像素的数量相差巨大,即遥感卫星图像中存在背景和道路像素样本量的极端不均衡,直接训练会导致网络无法有效学习到道路的特征。针对该问题,可以对训练样本图像进行预处理,下面结合图3进行描述。预处理的方法也可以应用在目标和背景分布很不均衡,较为复杂的其他图像的预处理上。
图3为本公开图像分割方法另一些实施例的流程图。如图3所示,该实施例的方法包括:步骤S302~S308。
在步骤S302中,针对每个训练样本图像,对训练样本图像和训练样本图像的标签掩膜图像按照预设尺寸分别剪裁成多个训练样本图像块和多个标签掩膜图像块。
训练样本图像包括:遥感卫星图像,也可以是其他高分辨率且目标和背景像素数量相差巨大的图像。标签掩膜图像中各个像素点的值为0或1,0表示与标签掩膜图像中像素点相同位置的训练样本图像中像素点属于目标(道路),1表示与标签掩膜图像中像素点相同位置的训练样本图像中像素点属于背景。
例如,给定一张高分辨率的遥感图像,在遥感图像上随机裁剪尺寸为512x512的图像块,可根据输入图像的大小选择裁剪20-30个图像块,该步骤在标签掩膜(mask)图像上同步同进行,确保标签的准确性。
在步骤S304中,选取标签掩膜图像块中像素点的值为1的数量超过预设数量的标签掩膜图像块和对应的训练样本图像块。
由于随机裁剪的图像块可能不存在道路,因此需要对这些裁剪的图像块进行筛选。根据裁剪的mask图像块来计算图像块中的道路像素样本的数量,并设定预设数量,例如为4000,道路像素量大于预设数量的图像块保留为有效数据,否则舍弃。
在步骤S306中,将选取的标签掩膜图像块对应的训练样本图像块进行数据增强,增加训练样本图像块的数量,得到预处理后的训练样本图像块。
可以对筛选的图像块使用一些数据增强策略,如翻转和旋转等策略,增加数据的样本量和多样性,用以提高模型的泛化性和鲁棒性。
在步骤S308中,将预处理后的训练样本图像块划分为源域图像和目标域图像。
源域图像带有标注信息,目标域图像不设置标注信息。
通过训练样本图像的预处理,可以获得足够的适合训练的源域图像和目标域图像。上述实施例的方法针对传统领域自适应方法在高分辨率遥感图像上适应性差的问题,设计了一套遥感道路图像裁剪、筛选和数据增强的策略,提升遥感图像和传统计算机视觉领域中领域自适应方法的兼容性。
本公开还提供一种图像分割装置,下面结合图4进行描述。
图4为本公开图像分割装置的一些实施例的结构图。如图4所示,该实施例的装置40包括:第一训练模块410,划分模块420,第二训练模块430。
第一训练模块410用于将源域图像和目标域图像分别输入第一生成对抗网络,基于对抗学习对第一生成对抗网络进行训练,得到训练完成的第一生成对抗网络,其中,第一生成对抗网络中第一生成器为(第一)图像分割模型。
在一些实施例中,第一训练模块410用于将源域图像和目标域图像分别输入(第一)图像分割模型的特征提取层,分别得到源域图像的第一特征图和目标域图像的第二特征图;将第一特征图和第二特征图分别输入(第一)图像分割模型的上采样层,分别得到与源域图像尺寸相同的源域图像的第三特征图和与目标域图像尺寸相同的目标域图像的第四特征图;将第三特征图和第四特征图分别输入(第一)图像分割模型的softmax层,分别得到源域图像的分割结果和目标域图像的分割结果。
在一些实施例中,(第一)图像分割模型的特征提取层包括多个,按照源域图像或目标域图像依次经过的顺序,各个特征提取层的下采样倍数依次增大,最后两个特征提取层分别连接空洞卷积金字塔池化模块;在各个特征提取层中存在一个或多个特征提取层的下采样倍数超过阈值的情况下,对一个或多个特征提取层使用空洞卷积,使一个或多个特征提取层的下采样倍数保持在阈值。
在一些实施例中,第一训练模块410用于将源域图像和目标域图像依次输入各个特征提取层;将源域图像最后经过的两个特征提取层输出的特征分别输入空洞卷积金字塔池化模块,得到输出的两个特征层分别对应的源域图像的第一多尺度全局特征和第二多尺度全局特征;将第一多尺度全局特征和第二多尺度全局特征进行融合,得到源域图像的第一特征图;将目标域图像最后经过的两个特征提取层输出的特征分别输入空洞卷积金字塔池化模块,得到输出的两个特征层分别对应的目标域图像的第三多尺度全局特征和第四多尺度全局特征;将第三多尺度全局特征和第四多尺度全局特征进行融合,得到目标域图像的第二特征图。
在一些实施例中,第一训练模块410用于每个训练时期,根据源域图像通过(第 一)图像分割模型后的分割结果和源域图像的标注信息的差异,对(第一)图像分割模型的参数进行调整;基于对抗学习对(第一)图像分割模型进行再次调整,并对第一生成对抗网络中的第一判别器的参数进行调整;重复上述过程,直至完成对第一生成对抗网络的训练。
在一些实施例中,第一训练模块410用于将目标域图像通过(第一)图像分割模型后的分割结果输入第一判别器,对目标域图像进行域类别判别,得到目标域图像的判别结果;根据目标域图像的判别结果,确定第一对抗损失函数;根据第一对抗损失函数对(第一)图像分割模型进行再次调整。
在一些实施例中,第一训练模块410用于将源域图像通过(第一)图像分割模型后的分割结果和目标域图像通过(第一)图像分割模型后的分割结果分别输入第一判别器,分别对源域图像和目标域图像进行域类别判别,得到源域图像的判别结果和目标域图像的判别结果;根据源域图像的判别结果和目标域图像的判别结果,确定第一交叉熵损失函数;根据第一交叉熵损失函数对第一判别器的参数进行调整。
划分模块420用于将目标域图像划分为第一集合和第二集合,其中,第一集合中的目标域图像通过训练完成的第一生成对抗网络中(第一)图像分割模型后,得到的分割结果作为标注信息,第二集合中目标域图像不设置标注信息。
在一些实施例中,划分模块420用于将目标域图像输入训练完成的第一生成对抗网络中(第一)图像分割模型,得到目标域图像的分割结果,并生成目标域图像的伪标签,其中,目标域图像的伪标签用于标注目标域图像中每个像素点属于目标或背景;根据目标域图像的分割结果和目标域图像的伪标签,确定目标域图像的评分,其中,目标域图像中属于目标的像素点在分割结果中对应的属于目标的概率值越高,目标域图像的伪标签标注目标域图像中属于目标的像素点的数量越少,目标域图像的评分越高;根据各个目标域图像的评分,选取部分目标域图像生成第一集合,第一集合之外的目标域图像生成第二集合,其中,第一集合中的目标域图像的伪标签作为该目标域图像的标注信息。
在一些实施例中,目标域图像中的评分采用以下公式确定:
Figure PCTCN2022071371-appb-000021
其中,
Figure PCTCN2022071371-appb-000022
表示目标域图像中(i,j)处像素属于目标的概率,
Figure PCTCN2022071371-appb-000023
表示目标域图像中(i,j)处像素的伪标签,
Figure PCTCN2022071371-appb-000024
表示目标域图像中(i,j)处像素属于目标,
Figure PCTCN2022071371-appb-000025
表示目 标域图像中(i,j)处像素属于背景,i和j为正整数,(i,j)表示像素在目标域图像中二维位置。
第二训练模块430用于将第一集合和第二集合中的目标域图像分别输入第二生成对抗网络,基于对抗学习对第二生成对抗网络进行训练,得到训练完成的第二生成对抗网络,从而确定第二生成对抗网络中(第二)图像分割模型的参数,其中,第二生成对抗网络中第二生成器为(第二)图像分割模型,第二图像分割模型与第一图像分割模型结构相同,第二图像分割模型的初始参数被赋值为训练完成的第一生成对抗网络中第一生成器的参数。
在一些实施例中,第二训练模块430用于每个训练时期,根据第一集合中目标域图像通过(第二)图像分割模型后的分割结果和第一集合中目标域图像的标注信息的差异,对(第二)图像分割模型的参数进行调整;基于对抗学习对(第二)图像分割模型进行再次调整,并对第二生成对抗网络中的第二判别器的参数进行调整;重复上述过程,直至完成对第二生成对抗网络的训练。
在一些实施例中,第二训练模块430用于将第二集合中目标域图像通过(第二)图像分割模型后的分割结果输入第二判别器,对第二集合中目标域图像进行域类别判别,得到第二集合中目标域图像的判别结果;根据第二集合中目标域图像的判别结果,确定第二对抗损失函数;根据第二对抗损失函数对(第二)图像分割模型进行再次调整。
在一些实施例中,第二训练模块430用于将第一集合中目标域图像通过(第二)图像分割模型后的分割结果和第二集合中目标域图像通过(第二)图像分割模型后的分割结果分别输入第二判别器,分别对第一集合和第二集合中目标域图像进行域类别判别,得到第一集合中目标域图像的判别结果和第二集合中目标域图像的判别结果;根据第一集合中目标域图像的判别结果和第二集合中目标域图像的判别结果,确定第二交叉熵损失函数;根据第二交叉熵损失函数对第二判别器的参数进行调整。
在一些实施例中,源域图像和目标域图像为包括道路的遥感卫星图像;该装置40还包括:预处理模块440,用于对训练样本图像和训练样本图像的标签掩膜图像按照预设尺寸分别剪裁成多个训练样本图像块和多个标签掩膜图像块,其中,训练样本图像包括:遥感卫星图像,标签掩膜图像中各个像素点的值为0或1,0表示标签掩膜图像中像素点相同位置的训练样本图像中像素点属于道路,1表示标签掩膜图像中像素点相同位置的训练样本图像中像素点属于背景;选取标签掩膜图像块中像素点的值 为1的数量超过预设数量的标签掩膜图像块和对应的训练样本图像块;将选取的标签掩膜图像块对应的训练样本图像块进行数据增强,增加训练样本图像块的数量,得到预处理后的训练样本图像块;将预处理后的训练样本图像块划分为源域图像和目标域图像。
在一些实施例中,该装置40还包括:图像分割模块450,用于将待分割图像输入确定参数的(第二)图像分割模型中,得到待分割图像的分割结果。
本公开的实施例中的图像分割装置可各由各种计算设备或计算机系统来实现,下面结合图5以及图6进行描述。
图5为本公开图像分割装置的一些实施例的结构图。如图5所示,该实施例的装置50包括:存储器510以及耦接至该存储器510的处理器520,处理器520被配置为基于存储在存储器510中的指令,执行本公开中任意一些实施例中的图像分割方法。
其中,存储器510例如可以包括系统存储器、固定非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)、数据库以及其他程序等。
图6为本公开图像分割装置的另一些实施例的结构图。如图6所示,该实施例的装置60包括:存储器610以及处理器620,分别与存储器510以及处理器520类似。还可以包括输入输出接口630、网络接口640、存储接口650等。这些接口630,640,650以及存储器610和处理器620之间例如可以通过总线660连接。其中,输入输出接口630为显示器、鼠标、键盘、触摸屏等输入输出设备提供连接接口。网络接口640为各种联网设备提供连接接口,例如可以连接到数据库服务器或者云端存储服务器等。存储接口650为SD卡、U盘等外置存储设备提供连接接口。
本领域内的技术人员应当明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解为可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数 据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生被配置为实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供被配置为实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述仅为本公开的较佳实施例,并不用以限制本公开,凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (17)

  1. 一种图像分割方法,包括:
    ;将源域图像和目标域图像分别输入第一生成对抗网络,基于对抗学习对所述第一生成对抗网络进行训练,得到训练完成的第一生成对抗网络,其中,所述第一生成对抗网络中第一生成器为第一图像分割模型
    将所述目标域图像划分为第一集合和第二集合,其中,所述第一集合中的目标域图像通过训练完成的第一生成对抗网络中第一图像分割模型后,得到的分割结果作为标注信息,所述第二集合中目标域图像不设置标注信息;
    将所述第一集合和第二集合中的目标域图像分别输入第二生成对抗网络,基于对抗学习对所述第二生成对抗网络进行训练,得到训练完成的第二生成对抗网络,从而确定所述第二生成对抗网络中第二图像分割模型的参数,其中,所述第二生成对抗网络中第二生成器为所述第二图像分割模型,所述第二图像分割模型与所述第一图像分割模型结构相同,所述第二图像分割模型的初始参数被赋值为训练完成的第一生成对抗网络中第一生成器的参数。
  2. 根据权利要求1所述的图像分割方法,其中,所述将源域图像和目标域图像分别输入第一生成对抗网络包括:
    将所述源域图像和所述目标域图像分别输入所述第一图像分割模型的特征提取层,分别得到所述源域图像的第一特征图和所述目标域图像的第二特征图;
    将第一特征图和所述第二特征图分别输入所述第一图像分割模型的上采样层,分别得到与所述源域图像尺寸相同的所述源域图像的第三特征图和与所述目标域图像尺寸相同的所述目标域图像的第四特征图;
    将所述第三特征图和所述第四特征图分别输入所述第一图像分割模型的softmax层,分别得到源域图像的分割结果和所述目标域图像的分割结果。
  3. 根据权利要求2所述的图像分割方法,其中,所述第一图像分割模型的特征提取层包括多个,按照所述源域图像或所述目标域图像依次经过的顺序,各个特征提取层的下采样倍数依次增大,最后两个特征提取层分别连接空洞卷积金字塔池化模块;
    在各个特征提取层中存在一个或多个特征提取层的下采样倍数超过阈值的情况 下,对所述一个或多个特征提取层使用空洞卷积,使所述一个或多个特征提取层的下采样倍数保持在所述阈值。
  4. 根据权利要求3所述的图像分割方法,其中,所述将所述源域图像和所述目标域图像分别输入所述第一图像分割模型的特征提取层,分别得到所述源域图像的第一特征图和所述目标域图像的第二特征图包括:
    将所述源域图像和所述目标域图像依次输入各个特征提取层;
    将所述源域图像最后经过的两个特征提取层输出的特征分别输入所述空洞卷积金字塔池化模块,得到输出的两个特征层分别对应的所述源域图像的第一多尺度全局特征和第二多尺度全局特征;
    将所述第一多尺度全局特征和所述第二多尺度全局特征进行融合,得到所述源域图像的第一特征图;
    将所述目标域图像最后经过的两个特征提取层输出的特征分别输入所述空洞卷积金字塔池化模块,得到输出的两个特征层分别对应的所述目标域图像的第三多尺度全局特征和第四多尺度全局特征;
    将所述第三多尺度全局特征和所述第四多尺度全局特征进行融合,得到所述目标域图像的第二特征图。
  5. 根据权利要求1所述的图像分割方法,其中,所述基于对抗学习对所述第一生成对抗网络进行训练包括:
    每个训练时期,根据所述源域图像通过所述第一图像分割模型后的分割结果和所述源域图像的标注信息的差异,对所述第一图像分割模型的参数进行调整;
    基于对抗学习对所述第一图像分割模型进行再次调整,并对所述第一生成对抗网络中的第一判别器的参数进行调整;
    重复上述过程,直至完成对所述第一生成对抗网络的训练。
  6. 根据权利要求5所述的图像分割方法,其中,所述基于对抗学习对所述第一图像分割模型进行再次调整包括:
    将所述目标域图像通过所述第一图像分割模型后的分割结果输入所述第一判别器,对所述目标域图像进行域类别判别,得到所述目标域图像的判别结果;
    根据所述目标域图像的判别结果,确定第一对抗损失函数;
    根据所述第一对抗损失函数对所述第一图像分割模型进行再次调整。
  7. 根据权利要求5所述的图像分割方法,其中,所述对所述第一生成对抗网络中的第一判别器的参数进行调整包括:
    将所述源域图像通过所述第一图像分割模型后的分割结果和所述目标域图像通过所述第一图像分割模型后的分割结果分别输入所述第一判别器,分别对所述源域图像和所述目标域图像进行域类别判别,得到所述源域图像的判别结果和所述目标域图像的判别结果;
    根据所述源域图像的判别结果和所述目标域图像的判别结果,确定第一交叉熵损失函数;
    根据所述第一交叉熵损失函数对所述第一判别器的参数进行调整。
  8. 根据权利要求1所述的图像分割方法,其中,所述将目标域图像划分为第一集合和第二集合包括:
    将所述目标域图像输入训练完成的第一生成对抗网络中第一图像分割模型,得到所述目标域图像的分割结果,并生成所述目标域图像的伪标签,其中,所述目标域图像的伪标签用于标注所述目标域图像中每个像素点属于目标或背景;
    根据所述目标域图像的分割结果和所述目标域图像的伪标签,确定所述目标域图像的评分,其中,所述目标域图像中属于目标的像素点在分割结果中对应的属于目标的概率值越高,所述目标域图像的伪标签标注所述目标域图像中属于目标的像素点的数量越少,所述目标域图像的评分越高;
    根据各个目标域图像的评分,选取部分目标域图像生成第一集合,第一集合之外的目标域图像生成第二集合,其中,所述第一集合中的目标域图像的伪标签作为该目标域图像的标注信息。
  9. 根据权利要求8所述的图像分割方法,其中,所述目标域图像中的评分采用以下公式确定:
    Figure PCTCN2022071371-appb-100001
    其中,
    Figure PCTCN2022071371-appb-100002
    表示所述目标域图像中(i,j)处像素属于目标的概率,
    Figure PCTCN2022071371-appb-100003
    表示所述目标域图像中(i,j)处像素的伪标签,
    Figure PCTCN2022071371-appb-100004
    表示所述目标域图像中(i,j)处像素属于目标,
    Figure PCTCN2022071371-appb-100005
    表示所述目标域图像中(i,j)处像素属于背景,i和j为正整数,(i,j)表示像素在所述目标域图像中二维位置。
  10. 根据权利要求1所述的图像分割方法,其中,所述基于对抗学习对所述第二生成对抗网络进行训练包括:
    每个训练时期,根据所述第一集合中目标域图像通过所述第二图像分割模型后的分割结果和所述第一集合中目标域图像的标注信息的差异,对所述第二图像分割模型的参数进行调整;
    基于对抗学习对所述第二图像分割模型进行再次调整,并对所述第二生成对抗网络中的第二判别器的参数进行调整;
    重复上述过程,直至完成对所述第二生成对抗网络的训练。
  11. 根据权利要求10所述的图像分割方法,其中,所述基于对抗学习对所述第二图像分割模型进行再次调整包括:
    将所述第二集合中目标域图像通过所述第二图像分割模型后的分割结果输入所述第二判别器,对所述第二集合中目标域图像进行域类别判别,得到所述第二集合中目标域图像的判别结果;
    根据所述第二集合中目标域图像的判别结果,确定第二对抗损失函数;
    根据所述第二对抗损失函数对所述第二图像分割模型进行再次调整。
  12. 根据权利要求10所述的图像分割方法,其中,所述对所述第二生成对抗网络中的第二判别器的参数进行调整包括:
    将所述第一集合中目标域图像通过所述第二图像分割模型后的分割结果和所述第二集合中目标域图像通过所述第二图像分割模型后的分割结果分别输入所述第二判别器,分别对所述第一集合和所述第二集合中目标域图像进行域类别判别,得到所述第一集合中目标域图像的判别结果和所述第二集合中目标域图像的判别结果;
    根据所述第一集合中目标域图像的判别结果和所述第二集合中目标域图像的判别结果,确定第二交叉熵损失函数;
    根据所述第二交叉熵损失函数对所述第二判别器的参数进行调整。
  13. 根据权利要求1所述的图像分割方法,其中,所述源域图像和所述目标域图像为包括道路的遥感卫星图像;
    所述方法还包括:
    对训练样本图像和所述训练样本图像的标签掩膜图像按照预设尺寸分别剪裁成多个训练样本图像块和多个标签掩膜图像块,其中,所述训练样本图像包括:遥感卫星图像,所述标签掩膜图像中各个像素点的值为0或1,0表示与所述标签掩膜图像中像素点相同位置的所述训练样本图像中像素点属于道路,1表示与所述标签掩膜图像中像素点相同位置的所述训练样本图像中像素点属于背景;
    选取所述标签掩膜图像块中像素点的值为1的数量超过预设数量的标签掩膜图像块和对应的训练样本图像块;
    将选取的标签掩膜图像块对应的训练样本图像块进行数据增强,增加训练样本图像块的数量,得到预处理后的训练样本图像块;
    将预处理后的训练样本图像块划分为所述源域图像和所述目标域图像。
  14. 根据权利要求1所述的图像分割方法,还包括:
    将待分割图像输入确定参数的所述第二图像分割模型中,得到所述待分割图像的分割结果。
  15. 一种图像分割装置,包括:
    第一训练模块,用于将源域图像和目标域图像分别输入第一生成对抗网络,基于对抗学习对所述第一生成对抗网络进行训练,得到训练完成的第一生成对抗网络,其中,所述第一生成对抗网络中第一生成器为第一图像分割模型;
    划分模块,用于将所述目标域图像划分为第一集合和第二集合,其中,所述第一集合中的目标域图像通过训练完成的第一生成对抗网络中第一图像分割模型后,得到的分割结果作为标注信息,所述第二集合中目标域图像不设置标注信息;
    第二训练模块,用于将所述第一集合和第二集合中的目标域图像分别输入第二生成对抗网络,基于对抗学习对所述第二生成对抗网络进行训练,得到训练完成的第二生成对抗网络,从而确定所述第二生成对抗网络中第二图像分割模型的参数,其中, 所述第二生成对抗网络中第二生成器为所述第二图像分割模型,所述第二图像分割模型与所述第一图像分割模型结构相同,所述第二图像分割模型的初始参数被赋值为训练完成的第一生成对抗网络中第一生成器的参数。
  16. 一种图像分割装置,包括:
    处理器;以及
    耦接至所述处理器的存储器,用于存储指令,所述指令被所述处理器执行时,使所述处理器执行如权利要求1-14任一项所述的图像分割方法。
  17. 一种非瞬时性计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现权利要求1-14任一项所述方法的步骤。
PCT/CN2022/071371 2021-03-26 2022-01-11 解码方法、装置和计算机可读存储介质 WO2022199225A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110325191.9 2021-03-26
CN202110325191.9A CN115205694A (zh) 2021-03-26 2021-03-26 图像分割方法、装置和计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2022199225A1 true WO2022199225A1 (zh) 2022-09-29

Family

ID=83396306

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071371 WO2022199225A1 (zh) 2021-03-26 2022-01-11 解码方法、装置和计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN115205694A (zh)
WO (1) WO2022199225A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661001B (zh) * 2022-12-14 2023-04-07 临沂大学 基于生成对抗网络的单通道煤岩图像增强方法
CN116895003B (zh) * 2023-09-07 2024-01-30 苏州魔视智能科技有限公司 目标对象的分割方法、装置、计算机设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875935A (zh) * 2018-06-11 2018-11-23 兰州理工大学 基于生成对抗网络的自然图像目标材质视觉特征映射方法
CN110276811A (zh) * 2019-07-02 2019-09-24 厦门美图之家科技有限公司 图像转换方法、装置、电子设备及可读存储介质
CN111340819A (zh) * 2020-02-10 2020-06-26 腾讯科技(深圳)有限公司 图像分割方法、装置和存储介质
CN111723780A (zh) * 2020-07-22 2020-09-29 浙江大学 基于高分辨率遥感图像的跨域数据的定向迁移方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875935A (zh) * 2018-06-11 2018-11-23 兰州理工大学 基于生成对抗网络的自然图像目标材质视觉特征映射方法
CN110276811A (zh) * 2019-07-02 2019-09-24 厦门美图之家科技有限公司 图像转换方法、装置、电子设备及可读存储介质
CN111340819A (zh) * 2020-02-10 2020-06-26 腾讯科技(深圳)有限公司 图像分割方法、装置和存储介质
CN111723780A (zh) * 2020-07-22 2020-09-29 浙江大学 基于高分辨率遥感图像的跨域数据的定向迁移方法及系统

Also Published As

Publication number Publication date
CN115205694A (zh) 2022-10-18

Similar Documents

Publication Publication Date Title
CN107341517B (zh) 基于深度学习层级间特征融合的多尺度小物体检测方法
CN109815886B (zh) 一种基于改进YOLOv3的行人和车辆检测方法及系统
Oršić et al. Efficient semantic segmentation with pyramidal fusion
Serna et al. Classification of traffic signs: The european dataset
Ping et al. A deep learning approach for street pothole detection
Schlosser et al. Fusing LIDAR and images for pedestrian detection using convolutional neural networks
Bi et al. Improved VGG model-based efficient traffic sign recognition for safe driving in 5G scenarios
CN110879959B (zh) 生成数据集的方法及装置、利用其的测试方法及测试装置
CN107358262B (zh) 一种高分辨率图像的分类方法及分类装置
US11810326B2 (en) Determining camera parameters from a single digital image
CN110766098A (zh) 基于改进YOLOv3的交通场景小目标检测方法
WO2022199225A1 (zh) 解码方法、装置和计算机可读存储介质
Kondapally et al. Towards a Transitional Weather Scene Recognition Approach for Autonomous Vehicles
CN111723693B (zh) 一种基于小样本学习的人群计数方法
CN111738055B (zh) 多类别文本检测系统和基于该系统的票据表单检测方法
CN112257637A (zh) 一种融合点云和多视图的车载激光点云多目标识别方法
CN110879960B (zh) 生成卷积神经网络学习用图像数据集的方法及计算装置
WO2023212997A1 (zh) 基于知识蒸馏的神经网络训练方法、设备及存储介质
CN111797846B (zh) 一种基于特征金字塔网络的反馈式目标检测方法
CN105574545B (zh) 街道环境图像多视角语义切割方法及装置
CN115471467A (zh) 一种高分辨率光学遥感影像建筑物变化检测方法
CN111507359A (zh) 一种图像特征金字塔的自适应加权融合方法
CN114519819B (zh) 一种基于全局上下文感知的遥感图像目标检测方法
Liu et al. Fine-grained multilevel fusion for anti-occlusion monocular 3d object detection
Meng et al. A block object detection method based on feature fusion networks for autonomous vehicles

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22773888

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14-02-2024)