WO2023221328A1 - 一种基于多光谱图像的语义分割方法、装置及存储介质 - Google Patents

一种基于多光谱图像的语义分割方法、装置及存储介质 Download PDF

Info

Publication number
WO2023221328A1
WO2023221328A1 PCT/CN2022/115291 CN2022115291W WO2023221328A1 WO 2023221328 A1 WO2023221328 A1 WO 2023221328A1 CN 2022115291 W CN2022115291 W CN 2022115291W WO 2023221328 A1 WO2023221328 A1 WO 2023221328A1
Authority
WO
WIPO (PCT)
Prior art keywords
semantic segmentation
category
multispectral
features
spectral
Prior art date
Application number
PCT/CN2022/115291
Other languages
English (en)
French (fr)
Inventor
谭明奎
罗佩瑶
李振梁
杜永红
Original Assignee
华南理工大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华南理工大学 filed Critical 华南理工大学
Publication of WO2023221328A1 publication Critical patent/WO2023221328A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30181Earth observation

Definitions

  • the present invention relates to the field of computer vision technology, and in particular to a semantic segmentation method, device and storage medium based on multispectral images.
  • existing methods introduce multispectral images and integrate visual information of light at different wavelengths to make up for the defects of RGB images affected by factors such as illumination, thereby improving model performance.
  • most existing methods simply fuse visible and non-visible light information without considering that pixels in the same category have similar multispectral characteristics, making it difficult to solve the problem of large intra-class differences.
  • some methods only introduce category context information in the spatial dimension, and do not take into account the varying degrees of redundancy of multispectral features between different categories. It is difficult to solve the interference and noise caused by redundant information, and there is small difference between categories. question.
  • the purpose of the present invention is to provide a semantic segmentation method, device and storage medium based on multispectral images.
  • a semantic segmentation method based on multispectral images including the following steps:
  • the semantic segmentation model includes a category-spectrum correlation module, which is used to improve the similarity between pixels of the same category and reduce the differences between classes to obtain continuous and accurate segmentation results.
  • the semantic segmentation model also includes a spectral channel enhancement module
  • the spectral channel enhancement module is used to use the channel attention mechanism to focus on important information in multi-spectral features to reduce redundant information of different categories in multi-spectral features.
  • collection and labeling of multispectral data sets for semantic segmentation include:
  • the high-resolution images in the training set are randomly intercepted into fixed-size images, and the high-resolution images in the verification set and test set are intercepted into fixed-size images in a sliding window manner.
  • the semantic segmentation model also includes an encoder, a spectral channel enhancement module and a decoder;
  • the encoder is used to extract features from multispectral images
  • the category-spectrum correlation module is used to obtain preliminary segmentation results in a supervised manner, and performs soft category mean pooling on multi-spectral features to obtain a category-spectrum relationship matrix, thereby reducing intra-class differences;
  • the spectral channel enhancement module is used to calculate the channel attention score of each category, assign weights to feature channels in different category areas, reduce redundant information in multi-spectral features of different categories, and thereby increase inter-category differences;
  • the decoder is used to decode multispectral features and output semantic segmentation results.
  • X h is defined as the high-level feature after the first upsampling in the decoder
  • X l is defined as the low-level feature output by the first stage of the backbone network in the encoder
  • the workflow of the category-spectrum correlation module is as follows:
  • the pixel features belonging to different category areas in the category attention map X p are respectively average pooled to obtain N multispectral features, and combined with the features
  • the workflow of the spectral channel enhancement module is as follows:
  • the encoder includes a backbone network and ASPP module
  • the backbone network uses atrous convolution instead of downsampling operation to increase the receptive field while preventing resolution degradation;
  • the ASPP module is used to fuse features extracted by multiple convolutional layers with different expansion rates to extract multi-scale contextual features.
  • using the multispectral data set to train the semantic segmentation model includes:
  • the cross-entropy function is used to calculate the loss
  • the stochastic gradient descent algorithm is used to update the parameters of the network until convergence.
  • a semantic segmentation device based on multispectral images including:
  • At least one memory for storing at least one program
  • the at least one processor When the at least one program is executed by the at least one processor, the at least one processor implements the above method.
  • a computer-readable storage medium in which a processor-executable program is stored, and the processor-executable program is used to perform the above method when executed by the processor.
  • the beneficial effects of the present invention are: by improving the similarity between pixels of the same category and reducing the differences between classes, the present invention can better extract complementary information from multispectral images and solve the problem of large intra-class differences. question.
  • Figure 1 is a step flow chart of a semantic segmentation method based on multispectral images in an embodiment of the present invention
  • Figure 2 is a schematic structural diagram of a semantic segmentation model based on multispectral images in a high-altitude scene in an embodiment of the present invention
  • Figure 3 is a schematic diagram of a category-spectrum correlation module in an embodiment of the present invention.
  • Figure 4 is a schematic diagram of a spectral channel enhancement module in an embodiment of the present invention.
  • Figure 5 is an example diagram of error predictions with large intra-class differences and small inter-class differences in the embodiment of the present invention.
  • orientation descriptions such as up, down, front, back, left, right, etc., are based on the orientation or position relationships shown in the drawings and are only In order to facilitate the description of the present invention and simplify the description, it is not intended to indicate or imply that the device or element referred to must have a specific orientation, be constructed and operate in a specific orientation, and therefore should not be construed as a limitation of the present invention.
  • this invention uses supervision to obtain rough segmentation results, prompting the model to correctly classify pixel features; on the other hand, it considers similar pixels that are far away or have large appearance differences, enhancing the semantic expression of pixel features within a category and reducing the number of categories. internal differences.
  • the present invention extracts multispectral features for each category, that is, constructs the relationship between categories and spectra.
  • the present invention uses the channel attention mechanism to reduce redundant information of multi-spectral features in different categories. Since different categories focus on different spectral feature channels, the present invention calculates channel attention scores for different categories, uses different channel attention scores for pixels in different category areas, and reduces the impact of noise in multi-spectral features according to different categories, thereby increasing the number of categories. difference between.
  • this embodiment provides a semantic segmentation method based on multispectral images, which specifically includes the following steps:
  • step S1 includes the following steps S11-S13:
  • the semantic segmentation model extracts features from the input multispectral image, assigns a category label to each pixel, and finally obtains the semantic segmentation result.
  • the semantic segmentation model mainly contains four parts: (1) encoder: designed to extract features from multispectral images; (2) category-spectral correlation module: obtain preliminary segmentation results in a supervised manner, and Perform soft class mean pooling on multispectral features to obtain a class-spectral relationship matrix, thereby reducing intra-class differences; (3) Spectral channel enhancement module: calculate the channel attention score of each class, and allocate feature channels to different class areas Weights reduce redundant information in multispectral features of different categories, thereby increasing differences between classes; (4) Decoder: decode multispectral features and output semantic segmentation results.
  • the encoder works as follows: input the multi-channel multispectral image I spec into the encoder to extract features.
  • the encoder consists of two parts: the backbone network and the ASPP module.
  • the backbone network is the ResNet101 model pre-trained on the ImageNet data set. Since the ResNet model uses 5 times of downsampling, the resolution of its output features is 1/32 of the input image, that is, the output span is 32, which results in the loss of a large amount of detailed information and the degradation of model performance. Therefore, the backbone network uses atrous convolution instead of downsampling operations to increase the receptive field while preventing resolution degradation.
  • the downsampling operations within the last two residual blocks are replaced with atrous convolutions with dilation rates of 2 and 4 respectively, so that the output span of the network is 8.
  • the present invention uses the ASPP module to fuse features extracted from multiple convolutional layers with different expansion rates to extract multi-scale contextual features.
  • the category-spectrum correlation module works as follows: first define X h as the high-level feature after the first upsampling in the decoder, which has rich category information.
  • X l is defined as the low-level feature output by the first stage of the backbone network in the encoder, which has rich detailed information.
  • This module is divided into two steps: (1) Generate category attention map X p . Through supervision, the network learns the attention map of each category, that is, the probability that each pixel belongs to that category. (2) Calculate the category-spectrum relationship matrix M. Based on the self-attention mechanism, the relationship between categories and spectra is obtained through matrix operations.
  • the steps for generating a category attention map X p are as follows: This embodiment reduces the number of channels of high-level semantic features Obtain N attention maps, that is, category attention maps X p . In order to better learn the relationship between categories and spectra, the present invention adopts supervised learning in the training stage and introduces a loss function to make X p close to the semantic segmentation label.
  • the steps for calculating the category-spectral relationship matrix M are as follows: Reduce the number of channels of the multispectral feature X l through a convolution layer with a convolution kernel size of 1x1 to obtain the feature X.
  • the present invention integrates category information into multi-spectral features by extracting the feature commonality of pixels of the same category, thereby reducing intra-category differences.
  • soft category mean pooling is used, that is, for X p
  • the pixel features of different category areas are average pooled respectively, thereby obtaining N multi-spectral features, and forming a category-spectral relationship matrix M.
  • the spectral channel enhancement module works as follows: focus on different spectral feature channels according to different categories.
  • the present invention uses the differences of different categories (inter-category context) to enhance multi-spectral features and reduce searches for other categories. space.
  • This module is mainly divided into two parts: (1) Calculate the spectral channel attention score A. Convert the category-spectrum relationship matrix M into the channel attention score of each category; (2) Redistribute the weights of the spectral feature channels. Pixels in the same category are multiplied by corresponding channel attention scores, and pixels in different category areas are multiplied with different channel attention scores, thereby reducing the impact of noise in multispectral features and increasing inter-category differences.
  • the steps for calculating the spectral channel attention score A are as follows:
  • the category-spectrum relationship matrix M is regarded as N independent spectral features, and based on the channel attention mechanism, N channel attention scores are calculated respectively. And constitute the spectral channel attention score A.
  • the channel attention mechanism used in the present invention is channel expansion and squeezing operations, and the squeezing rate is 32.
  • the steps for redistributing weights to spectral feature channels are as follows: Extract features from the low-level multi-spectral feature X l through a convolution layer with a convolution kernel size of 1x1, and obtain the feature X v . Then, the channel weights are reassigned to the pixel features belonging to the corresponding area of category k in X p , and they are spliced together with the original multispectral features X l to obtain the output feature Y.
  • the decoder works as follows: input the channel-enhanced spectral feature Y into the decoder, splice it with the original features of the decoder, and output the semantic segmentation result.
  • This embodiment enables the model to output accurate segmentation results by fusing multi-spectral features with rich details and high-level features with rich semantics.
  • the loss is calculated through the cross entropy function, and the stochastic gradient descent algorithm is used to update the parameters of the network until convergence. Evaluate and test on the validation set and test set respectively.
  • the semantic segmentation method based on multispectral images in high-altitude scenes proposed in this embodiment can, on the one hand, enhance the semantic expression of pixel features within a category and reduce intra-class differences, and on the other hand, reduce the impact of noise in multispectral features and thereby increase inter-class differences.
  • Tables 1 and 2 show the comparison results with the best existing methods on the Potsdam dataset and Vaihingen dataset respectively. After applying this solution, the performance of the semantic segmentation model can be improved on both commonly used multispectral data sets.
  • this embodiment has the following advantages and beneficial effects:
  • This invention uses supervision to obtain rough segmentation results, prompting the model to correctly classify pixel features; on the other hand, it considers similar pixels that are far away or have large appearance differences, enhancing the semantic expression of pixel features within a category and reducing the number of categories. internal differences. Finally, the present invention extracts multispectral features for each category, that is, constructs the relationship between categories and spectra. It can effectively solve the problem of large differences within the class.
  • the present invention uses the channel attention mechanism to reduce redundant information of multi-spectral features in different categories. Since different categories focus on different spectral feature channels, the present invention calculates channel attention scores for different categories, uses different channel attention scores for pixels in different category areas, and reduces the impact of noise in multi-spectral features according to different categories, thereby increasing the number of categories. difference between. It can solve the interference and noise caused by redundancy, and there is a problem of small differences between classes.
  • This embodiment also provides a semantic segmentation device based on multispectral images, including:
  • At least one memory for storing at least one program
  • the at least one processor When the at least one program is executed by the at least one processor, the at least one processor implements the method shown in FIG. 1 .
  • the multispectral image-based semantic segmentation device of this embodiment can execute the multispectral image-based semantic segmentation method provided by the method embodiment of the present invention, and can execute any combination of implementation steps of the method embodiment, and has the method Corresponding functions and beneficial effects.
  • the embodiment of the present application also discloses a computer program product or computer program.
  • the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device can read the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method shown in FIG. 1 .
  • This embodiment also provides a storage medium that stores instructions or programs that can execute a semantic segmentation method based on multispectral images provided by the method embodiment of the present invention. When the instructions or programs are run, the method can be executed. Any combination of implementation steps of the examples has the corresponding functions and beneficial effects of the method.
  • the functions/operations noted in the block diagrams may occur out of the order noted in the operational illustrations.
  • two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending on the functionality/operations involved.
  • the embodiments presented and described in the flow diagrams of the present invention are provided by way of example for the purpose of providing a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logical flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of a larger operation are performed independently.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present invention essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present invention.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .
  • a "computer-readable medium” may be any device that can contain, store, communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Non-exhaustive list of computer readable media include the following: electrical connections with one or more wires (electronic device), portable computer disk cartridges (magnetic device), random access memory (RAM), Read-only memory (ROM), erasable and programmable read-only memory (EPROM or flash memory), fiber optic devices, and portable compact disc read-only memory (CDROM).
  • the computer-readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, and subsequently edited, interpreted, or otherwise suitable as necessary. process to obtain the program electronically and then store it in computer memory.
  • various parts of the present invention may be implemented in hardware, software, firmware, or a combination thereof.
  • various steps or methods may be implemented using software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a logic gate circuit with a logic gate circuit for implementing a logic function on a data signal.
  • Discrete logic circuits application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于多光谱图像的语义分割方法、装置及存储介质,其中方法包括:收集并标注用于语义分割的多光谱数据集;构建语义分割模型;采用所述多光谱数据集对所述语义分割模型进行训练;获取待处理图像,将所述待处理图像输入训练后的所述语义分割模型,输出语义分割结果;所述语义分割模型包括类别-光谱关联模块,所述类别-光谱关联模块用于通过提高相同类别像素之间的相似性,减小类间的差异性,以获得连续准确的分割结果。本发明通过提高相同类别像素之间的相似性,减小类间的差异性,能够更好地从多光谱图像中提取互补的信息,解决了类内差异性大的问题。本发明可广泛应用于计算机视觉技术领域。

Description

一种基于多光谱图像的语义分割方法、装置及存储介质 技术领域
本发明涉及计算机视觉技术领域,尤其涉及一种基于多光谱图像的语义分割方法、装置及存储介质。
背景技术
近年来,随着高空场景的地表观测数据急剧增加,语义分割在高分辨率的遥感图像上获得了广泛的应用。该任务通常只需要输入RGB图像,从丰富的纹理信息中学习语义表达,便可获得了较好的结果。然而,这些方法仅从单一图像视角分析对象,即仅通过可见光的纹理信息区分不同类别的物体,导致模型具有一定的局限性。
为了解决上述问题,现有方法引入多光谱图像,通过整合光在不同波长下的视觉信息来弥补RGB图像受光照等因素影响的缺陷,进而提升模型性能。然而,现有大部分方法简单地融合可见光和非可见光的信息,没有考虑相同类别的像素具有相近的多光谱特征,因此难以解决类内差异性大的问题。另外,一些方法仅在空间维度上引入类别上下文信息,没有考虑多光谱特征在不同类别之间存在不同程度的冗余,难以解决冗余信息带来的干扰及噪声,存在类间差异性小的问题。
发明内容
为至少一定程度上解决现有技术中存在的技术问题之一,本发明的目的在于提供一种基于多光谱图像的语义分割方法、装置及存储介质。
本发明所采用的技术方案是:
一种基于多光谱图像的语义分割方法,包括以下步骤:
收集并标注用于语义分割的多光谱数据集;
构建语义分割模型;
采用所述多光谱数据集对所述语义分割模型进行训练;
获取待处理图像,将所述待处理图像输入训练后的所述语义分割模型,输出语义分割结果;
所述语义分割模型包括类别-光谱关联模块,所述类别-光谱关联模块用于通过提高相同 类别像素之间的相似性,减小类间的差异性,以获得连续准确的分割结果。
进一步地,所述语义分割模型还包括光谱通道增强模块;
所述光谱通道增强模块用于利用通道注意力机制关注多光谱特征中重要的信息,以减少不同类别在多光谱特征中的冗余信息。
进一步地,所述收集并标注用于语义分割的多光谱数据集,包括:
从高空场景中收集多光谱图像,并对所述多光谱图像进行标注,以构建成多光谱数据集;
将标注后的多光谱数据集划分成训练集、验证集、测试集三个部分;
对训练集内的高分辨率图像采用随机的方式截取固定大小的图像,对验证集和测试集内的高分辨率图像采用滑动窗口的方式截取固定大小的图像。
进一步地,所述语义分割模型还包括编码器、光谱通道增强模块以及解码器;
所述编码器,用于从多光谱图像中提取特征;
所述类别-光谱关联模块,用于采用监督的方式得到初步分割结果,并对多光谱特征进行软类别均值池化,得到类别-光谱关系矩阵,进而减小类内差异;
所述光谱通道增强模块,用于计算每个类别的通道注意力分数,对不同类别区域的特征通道分配权重,减少不同类别在多光谱特征的冗余信息,进而增大类间差异;
所述解码器,用于解码多光谱特征并输出语义分割结果。
进一步地,定义X h为所述解码器中第一次上采样后的高层特征;定义X l为所述编码器中骨干网络第一阶段输出的低层特征;
所述类别-光谱关联模块的工作流程如下:
将所述高层特征X h的通道数量减少到类别数目N,在类别维度进行softmax操作得到N张注意力图,作为类别注意力图X p
减少所述低层特征X l的通道数量,得到特征X;
对所述类别注意力图X p中属于不同类别区域的像素特征分别进行均值池化,得到N个多光谱特征,并结合所述特征X构成类别-光谱关系矩阵M。
进一步地,所述光谱通道增强模块的工作流程如下:
将所述类别-光谱关系矩阵M当作N个独立的光谱特征,基于通道注意力机制,分别计算N个通道注意力分数,并构成光谱通道注意力分数A;
对所述低层特征X l进行特征提取,得到特征X v
根据所述光谱通道注意力分数A和所述特征X v,对所述类别注意力图X p中属于类别k对应区域的像素特征重新分配通道权重,并和所述低层特征X l拼接在一起,得到输出特征Y。
进一步地,所述编码器包括骨干网络和ASPP模块;
其中,所述骨干网络采用空洞卷积代替下采样操作,以在增大感受野的同时防止分辨率下降;
所述ASPP模块,用于融合多个不同膨胀率的卷积层提取的特征来提取多尺度上下文特征。
进一步地,所述采用所述多光谱数据集对所述语义分割模型进行训练,包括:
采用交叉熵函数计算损失,并利用随机梯度下降算法更新网络的参数,直至收敛。
本发明所采用的另一技术方案是:
一种基于多光谱图像的语义分割装置,包括:
至少一个处理器;
至少一个存储器,用于存储至少一个程序;
当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现上所述方法。
本发明所采用的另一技术方案是:
一种计算机可读存储介质,其中存储有处理器可执行的程序,所述处理器可执行的程序在由处理器执行时用于执行如上所述方法。
本发明的有益效果是:本发明通过提高相同类别像素之间的相似性,减小类间的差异性,能够更好地从多光谱图像中提取互补的信息,解决了类内差异性大的问题。
附图说明
为了更清楚地说明本发明实施例或者现有技术中的技术方案,下面对本发明实施例或者现有技术中的相关技术方案附图作以下介绍,应当理解的是,下面介绍中的附图仅仅为了方便清晰表述本发明的技术方案中的部分实施例,对于本领域的技术人员而言,在无需付出创造性劳动的前提下,还可以根据这些附图获取到其他附图。
图1是本发明实施例中一种基于多光谱图像的语义分割方法的步骤流程图;
图2是本发明实施例中在高空场景下基于多光谱图像的语义分割模型的结构示意图;
图3是本发明实施例中类别-光谱关联模块的示意图;
图4是本发明实施例中光谱通道增强模块的示意图;
图5是本发明实施例中类内差异大和类间差异小的错误预测示例图。
具体实施方式
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。对于以下实施例中的步骤编号,其仅为了便于阐述说明而设置,对步骤之间的顺序不做任何限定,实施例中的各步骤的执行顺序均可根据本领域技术人员的理解来进行适应性调整。
在本发明的描述中,需要理解的是,涉及到方位描述,例如上、下、前、后、左、右等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。
在本发明的描述中,若干的含义是一个或者多个,多个的含义是两个以上,大于、小于、超过等理解为不包括本数,以上、以下、以内等理解为包括本数。如果有描述到第一、第二只是用于区分技术特征为目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。
本发明的描述中,除非另有明确的限定,设置、安装、连接等词语应做广义理解,所属技术领域技术人员可以结合技术方案的具体内容合理确定上述词语在本发明中的具体含义。
现有基于多光谱图像的语义分割方法存在以下问题:(1)没有考虑到相同类别的像素具有相近的多光谱特征,导致语义表达模糊,存在类内差异大的问题,如图5(a)所示。(2)没有考虑到可见光图像和非可见光图像之间存在冗余信息,仅在空间维度引入类别上下文信息,难以解决冗余带来的干扰及噪声,存在类间差异小的问题,如图5(b)所示。针对问题(1),本发明在空间上把像素划分为不同类别区域,并对相同类别的像素提取共同的多光谱特征。一方面,本发明利用监督的方式获得粗糙的分割结果,促使模型正确划分像素特征;另一方面考虑了远距离或外观差异大的同类像素,增强了类别内像素特征的语义表达并减少了类内差异。最终,本发明对每个类别提取了多光谱特征,即构建了类别和光谱之间的关系。针对问题(2),本发明利用通道注意力机制减少不同类别中多光谱特征的冗余信息。由于不同类别关注不同光谱特征通道,本发明计算不同类别的通道注意力分数,对不同类别区域的像素用不同的通道注意力分数,根据不同类别减少多光谱特征中噪声的影响,从而增大类间差异。
如图1所示,本实施例提供了一种基于多光谱图像的语义分割方法,具体包括以下步骤:
S1、收集并标注用于语义分割的多光谱数据集。
作为可选的实施方式,步骤S1包括以下步骤S11-S13:
S11、从高空场景中收集多光谱图像,并对所述多光谱图像进行标注,以构建成多光谱数据集。其中图片包含了“道路”,“建筑”,“树木”,“汽车”,“低矮植物”,“杂物”这6个类别。
S12、将标注后的多光谱数据集划分成训练集、验证集、测试集三个部分。
S13、对训练集内的高分辨率图像采用随机的方式截取固定大小(512*512像素)的图像,对验证集和测试集内的高分辨率图像采用滑动窗口的方式截取固定大小(512*512像素)的图像。
S2、构建语义分割模型。针对类内差异性大的问题,利用监督的方式对空间像素进行类别区域划分,对相同类别像素提取共同特征,从而显式地建立类别和光谱之间的联系;针对类间差异性小的问题,利用通道注意力机制,对不同类别区域的特征通道分配权重,让网络让模型自主关注有用的信息,减少噪声影响。
在高空场景下,语义分割模型对输入的多光谱图像进行特征提取,为每个像素赋予一个类别标签,最终获得语义分割结果。如图2所示,语义分割模型主要包含四个部分:(1)编码器:旨在从多光谱图像中提取特征;(2)类别-光谱关联模块:用监督的方式得到初步分割结果,并对多光谱特征进行软类别均值池化得到类别-光谱关系矩阵,进而减小类内差异;(3)光谱通道增强模块:计算每个类别的通道注意力分数,对不同类别区域的特征通道分配权重,减少不同类别在多光谱特征的冗余信息,进而增大类间差异;(4)解码器:解码多光谱特征并输出语义分割结果。
作为一种可选的实施方式,编码器的工作方式如下:将多通道的多光谱图像I spec输入到编码器中提取特征。编码器包含两个部分:骨干网络和ASPP模块。其中骨干网络为在ImageNet数据集上预训练的ResNet101模型。由于ResNet模型采用了5次下采样,其输出特征的分辨率为输入图像的1/32,即输出跨度为32,丢失了大量的细节信息并导致模型性能下降。因此骨干网络用空洞卷积代替下采样操作,在增大感受野的同时防止分辨率下降。具体而言,将最后两个残差块内的下采样操作替换成膨胀率分别为2和4的空洞卷积,使得网络的输出跨度为8。另外,本发明运用ASPP模块,融合多个不同膨胀率的卷积层提取的特征来提取多尺度上下文特征。
作为一种可选的实施方式,类别-光谱关联模块的工作方式如下:首先定义X h为解码器中第一次上采样后的高层特征,该特征具有丰富的类别信息。定义X l为编码器中骨干网络第一阶段输出的低层特征,该特征具有丰富的细节信息。该模块共分为两个步骤:(1)生成类别注意力图X p。通过监督的方式,使得网络学习每个类别的注意力图,即每个像素属于该类别 的概率。(2)计算类别-光谱关系矩阵M。基于自注意力机制,通过矩阵运算得到类别和光谱之间的关系。
参见图3,生成类别注意力图X p的步骤具体如下:本实施例通过卷积核大小为1x1的卷积层将高层语义特征X h的通道数量减少到类别数目N,在类别维度进行softmax操作得到N张注意力图,即类别注意力图X p。为了更好地学习类别和光谱的关系,本发明在训练阶段采用监督学习的方式,引入损失函数使X p接近语义分割标签。
参见图3,计算类别-光谱关系矩阵M的步骤具体如下:通过卷积核大小为1x1的卷积层减少多光谱特征X l的通道数量,得到特征X。为了提取获得的语义表达,本发明通过提取相同类别像素的特征共性,将类别信息整合到多光谱特征中,进而减少类内差异,具体地,采用软类别均值池化,即对X p中属于不同类别区域的像素特征分别进行均值池化,从而得到N个多光谱特征,并构成类别-光谱关系矩阵M。
作为一种可选的实施方式,光谱通道增强模块的工作方式如下:根据不同类别关注不同光谱特征通道,本发明利用不同类别(类间上下文)的差异性增强多光谱特征,减少其它类别的搜索空间。该模块主要分为两个部分:(1)计算光谱道注意力分数A。将类别-光谱关系矩阵M转化为每个类别的通道注意力分数;(2)对光谱特征通道重新分配权重。对相同类别的像素用对应的通道注意力分数相乘,对不同类别区域的像素用不同的通道注意力分数,从而减少多光谱特征中噪声的影响并增大类间差异。
参见图4,计算光谱通道注意力分数A的步骤具体如下:本实施例把类别-光谱关系矩阵M当作N个独立的光谱特征,基于通道注意力机制,分别计算N个通道注意力分数,并构成光谱通道注意力分数A。具体地,本发明所采用的通道注意力机制为通道扩张和挤压操作,挤压率为32。
参见图4,对光谱特征通道重新分配权重的步骤具体如下:通过卷积核大小为1x1的卷积层对低层的多光谱特征X l提取特征,得到特征X v。接着,对X p中属于类别k对应区域的像素特征重新分配通道权重,并和原来的多光谱特征X l拼接在一起,得到输出特征Y。
作为一种可选的实施方式,解码器的工作方式如下:将通道增强后的光谱特征Y输入到解码器中,和解码器原有的特征拼接并输出语义分割结果。本实施例通过融合具有丰富细节的多光谱特征以及具有丰富语义的高层特征,使得模型输出精确的分割结果。
S3、采用所述多光谱数据集对所述语义分割模型进行训练。
在划分好的训练集,通过交叉熵函数计算损失,并利用随机梯度下降算法更新网络的参数到收敛。在验证集和测试集上分别进行评估和测试。
S4、获取待处理图像,将所述待处理图像输入训练后的所述语义分割模型,输出语义分割结果。
本实施例提出的高空场景下基于多光谱图像的语义分割方法可以一方面增强类别内像素特征的语义表达并减少类内差异,另一方面减少多光谱特征中噪声的影响进而增大类间差异。表1和表2分别展示了在Potsdam数据集和Vaihingen数据集上与已有最好方法的对比结果。应用本方案之后,在两个常用的多光谱数据集上均能提升语义分割模型的性能。
表1本实施例方法与已有最好方法在Potsdam数据集上的对比结果
Figure PCTCN2022115291-appb-000001
表2本实施例方法与已有最好方法在Vaihingen数据集上的对比结果
Figure PCTCN2022115291-appb-000002
综上所述,本实施例相对于现有技术,具有如下优点及有益效果:
(1)本发明利用监督的方式获得粗糙的分割结果,促使模型正确划分像素特征;另一方面考虑了远距离或外观差异大的同类像素,增强了类别内像素特征的语义表达并减少了类内差异。最终,本发明对每个类别提取了多光谱特征,即构建了类别和光谱之间的关系。能够有效解决类内差异大的问题。
(2)本发明利用通道注意力机制减少不同类别中多光谱特征的冗余信息。由于不同类别关注不同光谱特征通道,本发明计算不同类别的通道注意力分数,对不同类别区域的像素用不同的通道注意力分数,根据不同类别减少多光谱特征中噪声的影响,从而增大类间差异。能够解决冗余带来的干扰及噪声,存在类间差异小的问题。
本实施例还提供一种基于多光谱图像的语义分割装置,包括:
至少一个处理器;
至少一个存储器,用于存储至少一个程序;
当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现图1所示方法。
本实施例的一种基于多光谱图像的语义分割装置,可执行本发明方法实施例所提供的一种基于多光谱图像的语义分割方法,可执行方法实施例的任意组合实施步骤,具备该方法相应的功能和有益效果。
本申请实施例还公开了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存介质中。计算机设备的处理器可以从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行图1所示的方法。
本实施例还提供了一种存储介质,存储有可执行本发明方法实施例所提供的一种基于多光谱图像的语义分割方法的指令或程序,当运行该指令或程序时,可执行方法实施例的任意组合实施步骤,具备该方法相应的功能和有益效果。
在一些可选择的实施例中,在方框图中提到的功能/操作可以不按照操作示图提到的顺序发生。例如,取决于所涉及的功能/操作,连续示出的两个方框实际上可以被大体上同时地执行或所述方框有时能以相反顺序被执行。此外,在本发明的流程图中所呈现和描述的实施例以示例的方式被提供,目的在于提供对技术更全面的理解。所公开的方法不限于本文所呈现的操作和逻辑流程。可选择的实施例是可预期的,其中各种操作的顺序被改变以及其中被描述为较大操作的一部分的子操作被独立地执行。
此外,虽然在功能性模块的背景下描述了本发明,但应当理解的是,除非另有相反说明, 所述的功能和/或特征中的一个或多个可以被集成在单个物理装置和/或软件模块中,或者一个或多个功能和/或特征可以在单独的物理装置或软件模块中被实现。还可以理解的是,有关每个模块的实际实现的详细讨论对于理解本发明是不必要的。更确切地说,考虑到在本文中公开的装置中各种功能模块的属性、功能和内部关系的情况下,在工程师的常规技术内将会了解该模块的实际实现。因此,本领域技术人员运用普通技术就能够在无需过度试验的情况下实现在权利要求书中所阐明的本发明。还可以理解的是,所公开的特定概念仅仅是说明性的,并不意在限制本发明的范围,本发明的范围由所附权利要求书及其等同方案的全部范围来决定。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,“计算机可读介质”可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。
计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。
应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件 来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
在本说明书的上述描述中,参考术语“一个实施方式/实施例”、“另一实施方式/实施例”或“某些实施方式/实施例”等的描述意指结合实施方式或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施方式或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施方式或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。
尽管已经示出和描述了本发明的实施方式,本领域的普通技术人员可以理解:在不脱离本发明的原理和宗旨的情况下可以对这些实施方式进行多种变化、修改、替换和变型,本发明的范围由权利要求及其等同物限定。
以上是对本发明的较佳实施进行了具体说明,但本发明并不限于上述实施例,熟悉本领域的技术人员在不违背本发明精神的前提下还可做作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。

Claims (10)

  1. 一种基于多光谱图像的语义分割方法,其特征在于,包括以下步骤:
    收集并标注用于语义分割的多光谱数据集;
    构建语义分割模型;
    采用所述多光谱数据集对所述语义分割模型进行训练;
    获取待处理图像,将所述待处理图像输入训练后的所述语义分割模型,输出语义分割结果;
    所述语义分割模型包括类别-光谱关联模块,所述类别-光谱关联模块用于通过提高相同类别像素之间的相似性,减小类间的差异性,以获得连续准确的分割结果。
  2. 根据权利要求1所述的一种基于多光谱图像的语义分割方法,其特征在于,所述语义分割模型还包括光谱通道增强模块;
    所述光谱通道增强模块用于利用通道注意力机制关注多光谱特征中重要的信息,以减少不同类别在多光谱特征中的冗余信息。
  3. 根据权利要求1所述的一种基于多光谱图像的语义分割方法,其特征在于,所述收集并标注用于语义分割的多光谱数据集,包括:
    从高空场景中收集多光谱图像,并对所述多光谱图像进行标注,以构建成多光谱数据集;
    将标注后的多光谱数据集划分成训练集、验证集、测试集三个部分;
    对训练集内的高分辨率图像采用随机的方式截取固定大小的图像,对验证集和测试集内的高分辨率图像采用滑动窗口的方式截取固定大小的图像。
  4. 根据权利要求1所述的一种基于多光谱图像的语义分割方法,其特征在于,所述语义分割模型还包括编码器、光谱通道增强模块以及解码器;
    所述编码器,用于从多光谱图像中提取特征;
    所述类别-光谱关联模块,用于采用监督的方式得到初步分割结果,并对多光谱特征进行软类别均值池化,得到类别-光谱关系矩阵,进而减小类内差异;
    所述光谱通道增强模块,用于计算每个类别的通道注意力分数,对不同类别区域的特征通道分配权重,减少不同类别在多光谱特征的冗余信息,进而增大类间差异;
    所述解码器,用于解码多光谱特征并输出语义分割结果。
  5. 根据权利要求4所述的一种基于多光谱图像的语义分割方法,其特征在于,定义X h为所述解码器中第一次上采样后的高层特征;定义X l为所述编码器中骨干网络第一阶段输出的低层特征;
    所述类别-光谱关联模块的工作流程如下:
    将所述高层特征X h的通道数量减少到类别数目N,在类别维度进行softmax操作得到N张注意力图,作为类别注意力图X p
    减少所述低层特征X l的通道数量,得到特征X;
    对所述类别注意力图X p中属于不同类别区域的像素特征分别进行均值池化,得到N个多光谱特征,并结合所述特征X构成类别-光谱关系矩阵M。
  6. 根据权利要求5所述的一种基于多光谱图像的语义分割方法,其特征在于,所述光谱通道增强模块的工作流程如下:
    将所述类别-光谱关系矩阵M当作N个独立的光谱特征,基于通道注意力机制,分别计算N个通道注意力分数,并构成光谱通道注意力分数A;
    对所述低层特征X l进行特征提取,得到特征X v
    根据所述光谱通道注意力分数A和所述特征X v,对所述类别注意力图X p中属于类别k对应区域的像素特征重新分配通道权重,并和所述低层特征X l拼接在一起,得到输出特征Y。
  7. 根据权利要求4所述的一种基于多光谱图像的语义分割方法,其特征在于,所述编码器包括骨干网络和ASPP模块;
    其中,所述骨干网络采用空洞卷积代替下采样操作,以在增大感受野的同时防止分辨率下降;
    所述ASPP模块,用于融合多个不同膨胀率的卷积层提取的特征来提取多尺度上下文特征。
  8. 根据权利要求1所述的一种基于多光谱图像的语义分割方法,其特征在于,所述采用所述多光谱数据集对所述语义分割模型进行训练,包括:
    采用交叉熵函数计算损失,并利用随机梯度下降算法更新网络的参数,直至收敛。
  9. 一种基于多光谱图像的语义分割装置,其特征在于,包括:
    至少一个处理器;
    至少一个存储器,用于存储至少一个程序;
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现权利要求1-8任一项所述方法。
  10. 一种计算机可读存储介质,其中存储有处理器可执行的程序,其特征在于,所述处理器可执行的程序在由处理器执行时用于执行如权利要求1-8任一项所述方法。
PCT/CN2022/115291 2022-05-17 2022-08-26 一种基于多光谱图像的语义分割方法、装置及存储介质 WO2023221328A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210533579.2A CN115082492A (zh) 2022-05-17 2022-05-17 一种基于多光谱图像的语义分割方法、装置及存储介质
CN202210533579.2 2022-05-17

Publications (1)

Publication Number Publication Date
WO2023221328A1 true WO2023221328A1 (zh) 2023-11-23

Family

ID=83246686

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/115291 WO2023221328A1 (zh) 2022-05-17 2022-08-26 一种基于多光谱图像的语义分割方法、装置及存储介质

Country Status (2)

Country Link
CN (1) CN115082492A (zh)
WO (1) WO2023221328A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180373932A1 (en) * 2016-12-30 2018-12-27 International Business Machines Corporation Method and system for crop recognition and boundary delineation
CN113327250A (zh) * 2021-05-28 2021-08-31 深圳前海微众银行股份有限公司 多光谱图像的分割方法、装置、电子设备及存储介质
CN113762264A (zh) * 2021-08-26 2021-12-07 南京航空航天大学 一种多编码器融合的多光谱图像语义分割方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180373932A1 (en) * 2016-12-30 2018-12-27 International Business Machines Corporation Method and system for crop recognition and boundary delineation
CN113327250A (zh) * 2021-05-28 2021-08-31 深圳前海微众银行股份有限公司 多光谱图像的分割方法、装置、电子设备及存储介质
CN113762264A (zh) * 2021-08-26 2021-12-07 南京航空航天大学 一种多编码器融合的多光谱图像语义分割方法

Also Published As

Publication number Publication date
CN115082492A (zh) 2022-09-20

Similar Documents

Publication Publication Date Title
US10691899B2 (en) Captioning a region of an image
CN111476284B (zh) 图像识别模型训练及图像识别方法、装置、电子设备
WO2022105125A1 (zh) 图像分割方法、装置、计算机设备及存储介质
CN112860888B (zh) 一种基于注意力机制的双模态情感分析方法
CN108108732A (zh) 字符辨识系统及其字符辨识方法
CN113065577A (zh) 一种面向目标的多模态情感分类方法
Kadam et al. Detection and localization of multiple image splicing using MobileNet V1
CN110390363A (zh) 一种图像描述方法
CN112651940B (zh) 基于双编码器生成式对抗网络的协同视觉显著性检测方法
Cao et al. Ancient mural restoration based on a modified generative adversarial network
CN113836992B (zh) 识别标签的方法、训练标签识别模型的方法、装置及设备
CN110738102A (zh) 一种人脸识别方法及系统
Xiao et al. Multi-sensor data fusion for sign language recognition based on dynamic Bayesian network and convolutional neural network
CN116610778A (zh) 基于跨模态全局与局部注意力机制的双向图文匹配方法
CN111522979B (zh) 图片排序推荐方法、装置、电子设备、存储介质
TWI803243B (zh) 圖像擴增方法、電腦設備及儲存介質
CN113792594B (zh) 一种基于对比学习的视频中语言片段定位方法及装置
CN114661951A (zh) 一种视频处理方法、装置、计算机设备以及存储介质
Barbhuiya et al. Gesture recognition from RGB images using convolutional neural network‐attention based system
WO2023221328A1 (zh) 一种基于多光谱图像的语义分割方法、装置及存储介质
CN117033804A (zh) 一种主客观视角引导下的点击诱导检测方法
CN116955707A (zh) 内容标签的确定方法、装置、设备、介质及程序产品
CN113362088A (zh) 一种基于crnn的电信行业智能客服图像识别的方法及其系统
CN117765450B (zh) 一种视频语言理解方法、装置、设备及可读存储介质
CN117746441B (zh) 一种视觉语言理解方法、装置、设备及可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22942357

Country of ref document: EP

Kind code of ref document: A1