CN116612385B

CN116612385B - Remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution

Info

Publication number: CN116612385B
Application number: CN202310578883.3A
Authority: CN
Inventors: 陈嘉辉; 彭玲; 王寅达; 杨丽娜
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2023-05-22
Filing date: 2023-05-22
Publication date: 2024-01-26
Anticipated expiration: 2043-05-22
Also published as: CN116612385A

Abstract

The invention discloses a method and system for extracting multi-category information from remote sensing images based on deep high-resolution relationship graph convolution and a computer storage medium. The method includes: S1: Divide the feature maps of different dimensions according to the SLIC superpixel segmentation results of the original RGB image of the corresponding size; S2: Segment the feature map to obtain K categories, and each pixel has a corresponding SLIC category; S3 : Learning by constructing resolution images of different sizes into a heterogeneous graph with two relationships as edges and SLIC classification as nodes; S4: Restore graph neural networks of different dimensions into feature maps, and cooperate with the fully connected layer to Complete the classification of pixels to extract the target category. This invention utilizes high-resolution features and relationship information in the generated graph structure to explicitly combine the scenario of multi-label classification tasks with graph learning, and has better modeling for heterogeneous graphs with multiple classes. method.

Description

Multi-category information extraction method for remote sensing images based on deep high-resolution relationship graph convolution with system

技术领域Technical field

本发明涉及遥感领域，深度学习计算机视觉领域，图学习领域，更具体地涉及一种基于深度高分辨率关系图卷积的遥感影像多类信息提取的方法与系统。The invention relates to the field of remote sensing, the field of deep learning computer vision, and the field of graph learning, and more specifically to a method and system for extracting multi-category information from remote sensing images based on deep high-resolution relationship graph convolution.

背景技术Background technique

近年来，基于深度学习的语义分割已成为主流方法。语义分割是把图片中的像素逐个进行分类，从而达到将想要的信息类别提取出来的方法。传统的信息提取方法在面向遥感影像时，主要还是依赖传统的卷积神经网络框架来提取特征，但是卷积的视野有限，主要探索的是局部小范围像素的特征关系，因此限制了多标签分类任务下捕捉高分辨率遥感影像的不同标签的远程依赖特征。In recent years, semantic segmentation based on deep learning has become a mainstream method. Semantic segmentation is a method of classifying pixels in a picture one by one to extract the desired information category. When facing remote sensing images, traditional information extraction methods mainly rely on the traditional convolutional neural network framework to extract features. However, convolution has a limited field of view and mainly explores the feature relationships of local small-scale pixels, thus limiting multi-label classification. Under the task of capturing the long-range dependent features of different labels in high-resolution remote sensing images.

全卷积网络(FCNs)[Fully Convolutional Networks for SemanticSegmentation]首先去除全连接层，引入端到端训练模式进行语义分割。然而，FCNs的下采样特征图破坏了空间信息。分割结果容易丢失边界信息。SegNet[SegNet:A DeepConvolutional Encoder-Decoder Architecture for Image Segmentation]利用编码器-解码器结构。它利用最大池位置索引对特征图进行上采样。因此，由于最大池化而丢失的空间信息和高频特征可以恢复。U-Net[U-Net:Convolutional Networks for BiomedicalImage Segmentation]利用可学习的转置卷积代替插值对特征图进行上采样。采用跳跃式连接，使译码器知道每一阶段由于最大池化而丢失的信息。Deeplab[Semantic ImageSegmentation with Deep Convolutional Nets and Fully Connected CRFs]系列工作主要集中于对空洞卷积的研究。空洞卷积可以在保持图像空间分辨率的同时扩大卷积核的接受域。[DeepLab:Semantic Image Segmentation with Deep Convolutional Nets,AtrousConvolution,and Fully Connected CRFs]中提出的Atrous空间金字塔池(ASPP)模块可以有效地捕获多尺度的上下文语义信息。然而，该方法在小尺度目标分割中效果不佳，容易丢失边界信息。为了解决空间信息丢失的问题，Lin等人提出了RefineNet[Encoder-Decoderwith Atrous Separable Convolution for Semantic Image Segmentation]，它将编码器的特征映射的每一层输入到一个refine块中。细化块可以充分整合不同分辨率的空间信息。因此，RefineNet以较高的精度实现了图像语义分割的任务。Fully Convolutional Networks (FCNs) [Fully Convolutional Networks for Semantic Segmentation] first removes the fully connected layer and introduces an end-to-end training mode for semantic segmentation. However, the downsampled feature maps of FCNs destroy spatial information. Segmentation results easily lose boundary information. SegNet [SegNet: A DeepConvolutional Encoder-Decoder Architecture for Image Segmentation] utilizes an encoder-decoder structure. It utilizes max pooling position index to upsample the feature map. Therefore, the spatial information and high-frequency features lost due to max pooling can be recovered. U-Net [U-Net: Convolutional Networks for BiomedicalImage Segmentation] utilizes learnable transposed convolutions instead of interpolation to upsample feature maps. Skip connections are used so that the decoder knows the information lost due to max pooling at each stage. The Deeplab [Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs] series of work mainly focuses on the research of atrous convolution. Atrous convolution can expand the receptive field of the convolution kernel while maintaining the spatial resolution of the image. The Atrous Spatial Pyramid Pooling (ASPP) module proposed in [DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, AtrousConvolution, and Fully Connected CRFs] can effectively capture multi-scale contextual semantic information. However, this method is not effective in small-scale target segmentation and easily loses boundary information. In order to solve the problem of spatial information loss, Lin et al. proposed RefineNet [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation], which inputs each layer of the encoder's feature map into a refine block. The thinning block can fully integrate spatial information of different resolutions. Therefore, RefineNet achieves the task of image semantic segmentation with high accuracy.

然而，卷积神经网络(CNNs)只能关注小尺度对图像局部信息的影响。上述方法难以捕捉高分辨率遥感影像的远程依赖特征。为了克服这一问题，Liang[A Deep NeuralNetwork Combined CNN and GCN for Remote Sensing Scene Classification]等人在图像中引入超像素节点来生成图，但不考虑具有多个标签的图模型。在遥感图像场景分类中，Li[Multi-Label Remote Sensing Image Scene Classification by Combining aConvolutional Neural Network and a Graph Neural Network]等使用超像素生成图像图形结构。将远端高级语义信息与图卷积网络(GNNs)相结合，取得了较好的效果。但是，它并没有将多标签分类任务的场景与图学习进行明确的结合。对于具有多个类的异构图，有更好的建模方法。However, convolutional neural networks (CNNs) can only focus on the small-scale impact on local information in the image. The above methods are difficult to capture the long-range dependence characteristics of high-resolution remote sensing images. To overcome this problem, Liang [A Deep NeuralNetwork Combined CNN and GCN for Remote Sensing Scene Classification] et al. introduced superpixel nodes in images to generate graphs, but did not consider graph models with multiple labels. In remote sensing image scene classification, Li [Multi-Label Remote Sensing Image Scene Classification by Combining aConvolutional Neural Network and a Graph Neural Network] et al. use superpixels to generate image graphics structures. Combining remote high-level semantic information with graph convolutional networks (GNNs) has achieved good results. However, it does not explicitly combine the scenario of multi-label classification tasks with graph learning. There are better ways to model heterogeneous graphs with multiple classes.

在本发明中，受高分辨率网络(HRNet)[Deep High-Resolution RepresentationLearning for Human Pose Estimation]和关系图卷积网络(R-GCN)[ModelingRelational Data with Graph Convolutional Networks]的启发，提出了一种基于深度高分辨率关系图卷积的遥感影像多类信息提取方法，该方法利用了高分辨率的特征和生成的图结构中的关系信息。In this invention, inspired by High-Resolution Network (HRNet) [Deep High-Resolution Representation Learning for Human Pose Estimation] and Relational Graph Convolutional Network (R-GCN) [ModelingRelational Data with Graph Convolutional Networks], a A multi-category information extraction method for remote sensing images based on deep high-resolution relationship graph convolution, which utilizes high-resolution features and relationship information in the generated graph structure.

发明内容Contents of the invention

鉴于上述问题，本发明提供了一种基于深度高分辨率关系图卷积的遥感影像多类信息提取方法，其目的在于解决现有的没有将多标签分类任务的场景与图形学习进行明确的问题。In view of the above problems, the present invention provides a method for extracting multi-category information from remote sensing images based on deep high-resolution relationship graph convolution, which aims to solve the existing problem of not clarifying the scene and graphics learning of multi-label classification tasks. .

为了实现上述目的，按照本发明的一个方面，提供了一种基于深度高分辨率关系图卷积的遥感影像多类信息提取方法，该方法包括以下步骤：In order to achieve the above objectives, according to one aspect of the present invention, a method for extracting multi-category information from remote sensing images based on depth and high-resolution relationship graph convolution is provided. The method includes the following steps:

S1：将不同的尺寸维度的特征图按照对应大小的原始RGB图像SLIC超像素分割结果进行划分；具体的做法是：对不同尺寸的图像，人工设置聚类个数K后，假设原有的图像像素个数为N，将图像先分成大小一致的块，每个块的大小为S；这里的在划分好的块中，选择随机选择聚类中心。在此处，为了避免采样点处于边缘或者影像噪声部分，需要手动调整一个采样点附近区域像素梯度邻近的点，选择梯度最小的作为聚类中心，在2S*2S的范围内计算像素点与该聚类中心的颜色距离d_c和空间距离d_s；其中，i点在RGB坐标内的值为(l_i,a_i,b_i)，j点在RGB坐标内的值为(l_j,a_j,b_j)；i点在距离坐标内的值为(x_i，y_i)，j点在距离坐标内的值为(x_j，y_j)；那么，dc和ds计算公式如下：S1: Divide the feature maps of different sizes and dimensions according to the SLIC superpixel segmentation results of the original RGB image of the corresponding size; the specific method is: for images of different sizes, after manually setting the number of clusters K, assume that the original image The number of pixels is N. The image is first divided into blocks of the same size, and the size of each block is S; here Among the divided blocks, cluster centers are chosen randomly. Here, in order to avoid the sampling point being at the edge or in the image noise part, it is necessary to manually adjust the points adjacent to the pixel gradient in the area near a sampling point, select the smallest gradient as the cluster center, and calculate the relationship between the pixel point and the pixel within the range of 2S*2S. The color distance d _c and spatial distance d _s of the cluster center; among them, the value of point i in RGB coordinates is (l _i , a _i , b _i ), and the value of point j in RGB coordinates is (l _j , a _j , b _j ); the value of point i in the distance coordinate is (x _i , y _i ), and the value of point j in the distance coordinate is (x _j , y _j ); then, the calculation formulas of dc and ds are as follows:

计算完距离之后，每一个像素点都会更新自己所属的图像块，将同一个图像块的像素点取平均，得到新的聚类中心，然后再重复前面的步骤，直到两次聚类中心的距离小于一个设定阈值的。After calculating the distance, each pixel will update the image block to which it belongs, average the pixels of the same image block to obtain a new cluster center, and then repeat the previous steps until the distance between the two cluster centers is less than a set threshold.

S2：分割特征图可以得到K个类别，每个像素有一个对应的SLIC类别；此时将K个类别作为K个节点，节点的特征为该像素所在的特征图位置上的特征，特征图的channel数量与节点的特征维度数量相同。相接的类别之间添加边。边的类别有两种，通过计算节点间的相似度来判定。具体的计算方式如下：S2: K categories can be obtained by segmenting the feature map, and each pixel has a corresponding SLIC category; at this time, the K categories are regarded as K nodes, and the characteristics of the nodes are the characteristics of the feature map position where the pixel is located. The number of channels is the same as the number of feature dimensions of the node. Add edges between adjacent categories. There are two types of edges, which are determined by calculating the similarity between nodes. The specific calculation method is as follows:

其中，a和b是两个n维向量，a_i代表节点a的第i维特征，b_i代表节点b的第i维特征，n的值取决于此时特征图的通道数。在归一化后，如果值大于0.5定为1类边，否则定为0类，以此来对相似像素集合和不同像素集合进行学习。Among them, a and b are two n-dimensional vectors, a _i represents the i-th dimension feature of node a, b _i represents the i-th dimension feature of node b, and the value of n depends on the number of channels of the feature map at this time. After normalization, if the value is greater than 0.5, it is classified as Class 1 edge, otherwise it is classified as Class 0, so as to learn similar pixel sets and different pixel sets.

S3：针对不同分辨率图像，通过计算SLIC类别对应特征相似度，分成相似、不相似两种关系，构造成由两种关系作为边、SLIC分类作为节点的异构图来进行学习；由于我们按照节点相似度将边分成了两类，我们会根据不同的边类型学习不同的特征变换矩阵，具体的公式如下：S3: For images of different resolutions, by calculating the similarity of features corresponding to the SLIC category, it is divided into two types of relationships: similar and dissimilar, and a heterogeneous graph with the two relationships as edges and the SLIC classification as nodes is constructed for learning; since we follow Node similarity divides edges into two categories. We will learn different feature transformation matrices according to different edge types. The specific formula is as follows:

其中，是第l层节点i的嵌入,/>是第l+1层节点i的嵌入，/>代表了节点i在第r种关系下的邻居节点集合，c_i,r是常数，/>代表在第l层的关系r下的特征变换矩阵，当R＝0时，即/>则代表节点对自己下一层的关系特征变换矩阵；R代表了关系的种类，R＝1的关系代表“相似”，R＝2的关系代表“不相似”。in, is the embedding of node i in layer l,/> is the embedding of node i in layer l+1,/> Represents the set of neighbor nodes of node i under the r-th relationship, c _{i, r} are constants,/> Represents the feature transformation matrix under the relationship r of the lth layer, when R=0, that is/> It represents the relationship feature transformation matrix of the node to the next layer of itself; R represents the type of relationship, the relationship with R=1 represents "similarity", and the relationship with R=2 represents "dissimilarity".

S4：将不同维度的图神经网络还原成特征图，配合全连接层对每个像素完成分类，从而提取出目标种类。S4: Restore graph neural networks of different dimensions into feature maps, and use the fully connected layer to classify each pixel to extract the target category.

本发明另一方面，一种基于深度高分辨率关系图卷积的遥感影像多类信息提取系统，该系统具体包括：On the other hand, the present invention is a remote sensing image multi-category information extraction system based on deep high-resolution relationship graph convolution. The system specifically includes:

分割模块，用于将不同的尺寸维度的特征图按照对应大小的原始RGB图像SLIC超像素分割结果进行划分；具体的做法是：对不同尺寸的图像，人工设置聚类个数K后，假设原有的图像像素个数为N，将图像先分成大小一致的块，每个块的大小为S；The segmentation module is used to divide feature maps of different sizes and dimensions according to the SLIC superpixel segmentation results of the original RGB image of the corresponding size; the specific method is: for images of different sizes, after manually setting the number of clusters K, assuming that the original Some images have N pixels. The image is first divided into blocks of the same size, and the size of each block is S;

分类模块，用于根据分割模块将特征图划分得到K个类别，每个像素有一个对应的SLIC类别；The classification module is used to divide the feature map into K categories according to the segmentation module, and each pixel has a corresponding SLIC category;

学习模块，用于针对不同分辨率图像，通过计算SLIC类别对应特征相似度，分成相似、不相似两种关系，构造成由两种关系作为边、SLIC分类作为节点的异构图来进行学习；The learning module is used to calculate the similarity of features corresponding to SLIC categories for images of different resolutions, divide them into two types of relationships, similar and dissimilar, and construct a heterogeneous graph with the two relationships as edges and the SLIC classification as nodes for learning;

提取模块，用于将不同维度的图神经网络还原成特征图，配合全连接层对每个像素完成分类，从而提取出目标种类。The extraction module is used to restore graph neural networks of different dimensions into feature maps, and cooperates with the fully connected layer to classify each pixel, thereby extracting the target category.

本发明另一方面，一种计算机存储介质，所述计算机存储介质中存储有计算机程序指令，所述计算机程序指令被处理器执行时实现上述任意所述的一种基于深度高分辨率关系图卷积的遥感影像多类信息提取方法。Another aspect of the present invention is a computer storage medium. Computer program instructions are stored in the computer storage medium. When the computer program instructions are executed by a processor, any one of the above-mentioned depth-based high-resolution relationship graphs can be implemented. Multi-category information extraction method for accumulated remote sensing images.

在Potsdam和Vaihingen公共数据集上测试了所提出的方法，并将其与最先进的方法进行了比较，在F1(F1-score)和IoU(Intersection over Union)两个精度指标下，该方法优于所有比较方法。The proposed method was tested on Potsdam and Vaihingen public datasets and compared with state-of-the-art methods. Under two accuracy metrics: F1 (F1-score) and IoU (Intersection over Union), the method outperformed for all comparison methods.

附图说明Description of the drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅用于示出优先实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only They are used for the purpose of illustrating preferred embodiments and are not to be construed as limitations of the invention. Also throughout the drawings, the same reference characters are used to designate the same components. In the attached picture:

图1示出基于深度高分辨率关系图卷积的遥感影像多类信息提取示意图；Figure 1 shows a schematic diagram of multi-category information extraction from remote sensing images based on deep high-resolution relationship graph convolution;

图2示出分割特征图得到的类别图。Figure 2 shows the class map obtained by segmenting the feature map.

具体实施方式Detailed ways

为使本申请的上述目的、特征和优点能够更加明显易懂，下面结合附图和具体实施方式对本发明实施方式中的技术方案进行清楚、完整地描述，显然，所描述的实施方式仅仅是本发明一部分实施方式，而不是全部的实施方式。基于本发明中的实施方式，本领域技术人员在没有做出创造性劳动前提下所有其他实施方式，都属于本发明保护的范围。In order to make the above objects, features and advantages of the present application more obvious and easy to understand, the technical solutions in the embodiments of the present invention are clearly and completely described below in conjunction with the accompanying drawings and specific embodiments. Obviously, the described embodiments are only for the purpose of this application. Some embodiments of the invention are disclosed, but not all the embodiments. Based on the embodiments of the present invention, all other embodiments by those skilled in the art without creative efforts shall fall within the scope of protection of the present invention.

如图1所示，HRNet最初是为了解决人体姿态估计问题而提出的。该网络包含四个平行子网，分别表示四种不同的分辨率。同时，随着深度的增加，高分辨率子网逐渐加入低分辨率子网，多分辨率子网并行连接。与编码器-解码器架构不同的是，该网络不需要从下采样特征图进行上采样。由于HRNet始终保持高分辨率的特征表示，它直观地为语义分割任务带来了更丰富的语义特征。以往的做法中最后会将四个尺寸的特征图采样到同一维度后进行预测。我们将不同的尺寸维度的特征图按照对应大小的原始RGB图像SLIC超像素分割结果进行划分。As shown in Figure 1, HRNet was originally proposed to solve the human pose estimation problem. The network contains four parallel subnetworks representing four different resolutions. At the same time, as the depth increases, high-resolution subnets are gradually added to low-resolution subnets, and multi-resolution subnets are connected in parallel. Unlike the encoder-decoder architecture, this network does not require upsampling from downsampled feature maps. Since HRNet always maintains high-resolution feature representation, it intuitively brings richer semantic features to the semantic segmentation task. In the past practice, feature maps of four sizes were finally sampled into the same dimension for prediction. We divide the feature maps of different size dimensions according to the SLIC superpixel segmentation results of the original RGB image of the corresponding size.

具体的做法是：对不同尺寸的图像，人工设置聚类个数K后，假设原有的图像像素个数为N，我们将图像先分成大小一致的块，每个块的大小为S。这里的在划分好的块中，选择随机选择聚类中心。在此处，为了避免采样点处于边缘或者影像噪声部分，我们需要手动调整一个采样点附近区域像素梯度邻近的点，选择梯度最小的作为聚类中心，在2S*2S的范围内计算像素点与该聚类中心的颜色距离d_c和空间距离d_s；其中，i点在RGB坐标内的值为(l_i,a_i,b_i)，j点在RGB坐标内的值为(l_j,a_j,b_j)；i点在距离坐标内的值为(x_i，y_i)，j点在距离坐标内的值为(x_j，y_j)；那么，dc和ds计算公式如下：The specific method is: for images of different sizes, after manually setting the number of clusters K, assuming that the number of pixels in the original image is N, we first divide the image into blocks of the same size, and the size of each block is S. here Among the divided blocks, cluster centers are chosen randomly. Here, in order to avoid the sampling point being at the edge or in the image noise part, we need to manually adjust the points adjacent to the pixel gradient in the area near a sampling point, select the smallest gradient as the cluster center, and calculate the pixel point and The color distance d _c and spatial distance d _s of the cluster center; among them, the value of point i in RGB coordinates is (l _i , a _i , b _i ), and the value of point j in RGB coordinates is (l _j , a _j , b _j ); the value of point i in the distance coordinate is (x _i , y _i ), and the value of point j in the distance coordinate is (x _j , y _j ); then, the calculation formulas of dc and ds are as follows:

由此，分割特征图由此可以得到K个类别如图2。每个像素因此有一个对应的SLIC类别。此时将K个类别作为K个节点，节点的特征为该像素所在的特征图位置上的特征，特征图的channel数量与节点的特征维度数量相同。相接的类别之间添加边。边的类别有两种，通过计算节点间的相似度来判定。具体的计算方式如下：From this, the segmentation feature map can obtain K categories as shown in Figure 2. Each pixel therefore has a corresponding SLIC class. At this time, K categories are regarded as K nodes. The characteristics of the node are the characteristics of the feature map position where the pixel is located. The number of channels of the feature map is the same as the number of feature dimensions of the node. Add edges between adjacent categories. There are two types of edges, which are determined by calculating the similarity between nodes. The specific calculation method is as follows:

其中，a和b是两个n维向量，a_i代表节点a的第i维特征，b_i代表节点b的第i维特征，n的值取决于此时特征图的通道数。在归一化后，如果值大于0.5定为1类边，否则定为0类，以此来对相似像素集合和不同像素集合进行学习。通过将不同尺寸的分辨率图像(Image)构造成由两种关系作为边、slic分类作为节点的异构图(Graph)来进行学习，有两个优点：1)图结构将扩展不同分辨率特征图的信息更新规模，不局限于本地规则的小范围区域。因为此时，卷积的对象并不再是固定区域的像素，而是视野更大的由slic分类得到的节点，每个slic分类得到的节点对应着一片特征相近的像素，像素的分布形状也可以不规则。在对图结构进行卷积操作时，实际上是在对多片大范围像素进行特征学习。2)基于特征相似度的异构图可以将像素聚合为类别集合，提高目标像素的分类精度。在完成构造的图后，我们按照多关系图神经网络学习步骤进行学习。由于我们按照节点相似度将边分成了两类，我们会根据不同的边类型学习不同的特征变换矩阵，具体的公式如下：Among them, a and b are two n-dimensional vectors, a _i represents the i-th dimension feature of node a, b _i represents the i-th dimension feature of node b, and the value of n depends on the number of channels of the feature map at this time. After normalization, if the value is greater than 0.5, it is classified as Class 1 edge, otherwise it is classified as Class 0, so as to learn similar pixel sets and different pixel sets. Learning by constructing resolution images (Images) of different sizes into a heterogeneous graph (Graph) with two relationships as edges and slic classification as nodes has two advantages: 1) The graph structure will expand the characteristics of different resolutions The scale of graph information update is not limited to the small area of local rules. Because at this time, the objects of convolution are no longer pixels in a fixed area, but nodes obtained by slic classification with a larger field of view. Each node obtained by slic classification corresponds to a piece of pixels with similar characteristics, and the distribution shape of the pixels is also Can be irregular. When performing a convolution operation on a graph structure, feature learning is actually performed on multiple large-scale pixels. 2) Heterogeneous graphs based on feature similarity can aggregate pixels into category sets to improve the classification accuracy of target pixels. After completing the constructed graph, we follow the multi-relationship graph neural network learning steps. Since we divide edges into two categories according to node similarity, we will learn different feature transformation matrices according to different edge types. The specific formula is as follows:

其中，是第l层节点i的嵌入，/>是第l+1层节点i的嵌入，/>代表了节点i在第r种关系下的邻居节点集合，c_i,r是常数，/>代表在第l层的关系r下的特征变换矩阵，当R＝0时，即/>则代表节点对自己下一层的关系特征变换矩阵；R代表了关系的种类，R＝1的关系代表“相似”，R＝2的关系代表“不相似”。in, is the embedding of node i in layer l,/> is the embedding of node i in layer l+1,/> Represents the set of neighbor nodes of node i under the r-th relationship, c _{i, r} are constants,/> Represents the feature transformation matrix under the relationship r of the lth layer, when R=0, that is/> It represents the relationship feature transformation matrix of the node to the next layer of itself; R represents the type of relationship, the relationship with R=1 represents "similarity", and the relationship with R=2 represents "dissimilarity".

最后，将不同维度的图神经网络还原成特征图，配合全连接层对每个像素完成分类，可以提取出目标种类。Finally, the graph neural network of different dimensions is restored into a feature map, and the fully connected layer is used to classify each pixel, and the target category can be extracted.

分割模块，用于将不同的尺寸维度的特征图按照对应大小的原始RGB图像SLIC超像素分割结果进行划分；具体的做法是：对不同尺寸的图像，人工设置聚类个数K后，假设原有的图像像素个数为N,将图像先分成大小一致的块，每个块的大小为S；这里的在划分好的块中，选择随机选择聚类中心。在划分好的块中，选择随机选择聚类中心。在此处，为了避免采样点处于边缘或者影像噪声部分，需要手动调整一个采样点附近区域像素梯度邻近的点，选择梯度最小的作为聚类中心，在2S*2S的范围内计算像素点与该聚类中心的颜色距离d_c和空间距离d_s；其中，i点在RGB坐标内的值为(l_i,a_i,b_i)，j点在RGB坐标内的值为(l_j,a_j,b_j)；i点在距离坐标内的值为(x_i，y_i)，j点在距离坐标内的值为(x_j，y_j)；那么，dc和ds计算公式如下：The segmentation module is used to divide feature maps of different sizes and dimensions according to the SLIC superpixel segmentation results of the original RGB image of the corresponding size; the specific method is: for images of different sizes, after manually setting the number of clusters K, assuming that the original Some images have N pixels. The image is first divided into blocks of the same size, and the size of each block is S; here Among the divided blocks, cluster centers are chosen randomly. Among the divided blocks, cluster centers are chosen randomly. Here, in order to avoid the sampling point being at the edge or in the image noise part, it is necessary to manually adjust the points adjacent to the pixel gradient in the area near a sampling point, select the smallest gradient as the cluster center, and calculate the relationship between the pixel point and the pixel within the range of 2S*2S. The color distance d _c and spatial distance d _s of the cluster center; among them, the value of point i in RGB coordinates is (l _i , a _i , b _i ), and the value of point j in RGB coordinates is (l _j , a _j , b _j ); the value of point i in the distance coordinate is (x _i , y _i ), and the value of point j in the distance coordinate is (x _j , y _j ); then, the calculation formulas of dc and ds are as follows:

分类模块，用于根据分割模块将特征图划分得到K个类别，每个像素有一个对应的SLIC类别；此时将K个类别作为K个节点，节点的特征为该像素所在的特征图位置上的特征，特征图的channel数量与节点的特征维度数量相同。相接的类别之间添加边。边的类别有两种，通过计算节点间的相似度来判定。具体的计算方式如下：The classification module is used to divide the feature map into K categories according to the segmentation module. Each pixel has a corresponding SLIC category; at this time, the K categories are regarded as K nodes, and the characteristics of the nodes are the positions of the feature maps where the pixels are located. The number of channels of the feature map is the same as the number of feature dimensions of the node. Add edges between adjacent categories. There are two types of edges, which are determined by calculating the similarity between nodes. The specific calculation method is as follows:

学习模块，用于针对不同分辨率图像，通过计算SLIC类别对应特征相似度，分成相似、不相似两种关系，构造成由两种关系作为边、SLIC分类作为节点的异构图来进行学习；将K个类别作为K个节点，节点的特征为该像素所在的特征图位置上的特征，特征图的channel数量与节点的特征维度数量相同。由于我们按照节点相似度将边分成了两类，我们会根据不同的边类型学习不同的特征变换矩阵，具体的公式如下：The learning module is used to calculate the similarity of features corresponding to SLIC categories for images of different resolutions, divide them into two types of relationships, similar and dissimilar, and construct a heterogeneous graph with the two relationships as edges and the SLIC classification as nodes for learning; K categories are regarded as K nodes. The characteristics of the node are the characteristics of the feature map position where the pixel is located. The number of channels of the feature map is the same as the number of feature dimensions of the node. Since we divide edges into two categories according to node similarity, we will learn different feature transformation matrices according to different edge types. The specific formula is as follows:

提取模块，用于将不同维度的图神经网络还原成特征图，配合全连接层对每个像素完成分类，从而提取出目标种类。The extraction module is used to restore graph neural networks of different dimensions into feature maps, and cooperate with the fully connected layer to classify each pixel, thereby extracting the target category.

本申请实施例还提供了一种计算机可读存储介质，计算机可读存储介质上存储计算机程序，计算机程序被处理器执行时实现上述一种基于深度高分辨率关系图卷积的遥感影像多类信息提取方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。Embodiments of the present application also provide a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, it implements the above-mentioned multi-class remote sensing image based on deep high-resolution relationship graph convolution. Each process of the information extraction method embodiment can achieve the same technical effect. To avoid duplication, it will not be described again here.

本说明书中的各个实施例均采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似的部分互相参见即可。Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same and similar parts between the various embodiments can be referred to each other.

本领域内的技术人员应明白，本申请实施例的实施例可提供为方法、装置、或计算机存储介质。因此，本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that embodiments of the embodiments of the present application may be provided as methods, devices, or computer storage media. Therefore, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine such that the instructions are executed by the processor of the computer or other programmable data processing terminal device. Means are generated for implementing the functions specified in the process or processes of the flowchart diagrams and/or the block or blocks of the block diagrams.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing terminal equipment to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the The instruction means implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上，使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing terminal equipment, so that a series of operating steps are performed on the computer or other programmable terminal equipment to produce computer-implemented processing, thereby causing the computer or other programmable terminal equipment to perform a computer-implemented process. The instructions executed on provide steps for implementing the functions specified in a process or processes of the flow diagrams and/or a block or blocks of the block diagrams.

尽管已描述了本申请实施例的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例做出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。Although preferred embodiments of the embodiments of the present application have been described, those skilled in the art may make additional changes and modifications to these embodiments once the basic inventive concepts are understood. Therefore, the appended claims are intended to be construed to include the preferred embodiments and all changes and modifications that fall within the scope of the embodiments of the present application.

Claims

1. A remote sensing image multiclass information extraction method based on depth high resolution relation graph convolution is characterized by comprising the following steps:

s1: dividing feature images with different dimension according to the SLIC super-pixel segmentation result of the original RGB image with corresponding size; the specific method comprises the following steps: after the clustering number K is manually set for the images with different sizes, the original image pixel number is assumed to be N, the images are firstly divided into blocks with the same size, and the size of each block is S;

s2: dividing the feature map to obtain K categories, wherein each pixel has a corresponding SLIC category;

s3: aiming at images with different resolutions, the corresponding feature similarity of SLIC categories is calculated and divided into two similar and dissimilar relations, and the two relations are used as edges and SLIC categories are used as heterogeneous graphs of nodes to learn;

s4: restoring the image neural network with different dimensions into a feature image, and completing classification of each pixel by matching with a full-connection layer so as to extract target types;

the method is characterized by comprising the following steps of constructing a heterogeneous graph with two relations as edges and SLIC classification as nodes to learn, wherein the heterogeneous graph specifically comprises the following steps:

edges are added between the connected categories, the categories of the edges are two, the edges are judged by calculating the similarity between the nodes, and the specific calculation mode is as follows:

wherein a and b are two n-dimensional vectors, a _i Representing the ith dimension characteristic of node a, b _i The value of n representing the i-th dimensional feature of node b depends on the number of channels of the feature map at this time; after normalization, if the value is greater than 0.5, determining the value as 1 class edge, otherwise determining the value as 0 class, and learning the similar pixel set and the different pixel sets;

the method comprises the steps of,

according to the multi-relation graph neural network learning step, the edges are divided into two types according to the node similarity, and different feature transformation matrixes are learned according to different edge types, wherein the specific formula is as follows:

wherein,is the embedding of layer i node +.>Is the embedding of layer i of the l+1 node, of->Represents the neighbor node set of the node i under the r-th relation, c _i,r Is constant, & lt>Represents the feature transformation matrix under the relation R of the first layer, when r=0, i.e. +.>Representing the relation feature transformation matrix of the node to the next layer; r represents the kind of relationship, r=1 represents "similar", and r=2 represents "dissimilar".

2. The method for extracting information from multiple classes of remote sensing images based on depth high resolution relationship graph convolution as defined in claim 1, wherein S1 specifically comprises: here, theAnd selecting a cluster center from the divided blocks.

3. The remote sensing image multiclass information extraction method based on depth high resolution relation graph convolution as defined in claim 2, wherein selecting a clustering center specifically comprises: in order to avoid the sampling point being at the edge or the image noise part, the point adjacent to the gradient of the pixel in the area near one sampling point needs to be manually adjusted, the smallest gradient is selected as the clustering center, and the color distance d between the pixel point and the clustering center is calculated in the range of 2S by 2S _c And a spatial distance d _s The method comprises the steps of carrying out a first treatment on the surface of the Wherein the value of the i point in the RGB coordinates is (l) _i ,a _i ,b _i ) The value of j point in RGB coordinates is (l) _j ,a _j ,b _j ) The method comprises the steps of carrying out a first treatment on the surface of the The value of the i point in the distance coordinate is (x _i ，y _i ) The value of j point in the distance coordinate is (x _j ，y _j ) The method comprises the steps of carrying out a first treatment on the surface of the Then, dc and ds are calculatedThe formula is as follows:

after the distance is calculated, each pixel point updates the image block to which the pixel point belongs, averages the pixel points of the same image block to obtain a new clustering center, and then repeats the previous steps until the distance between the two clustering centers is smaller than a set threshold value.

4. The remote sensing image multiclass information extraction method based on depth high resolution relation graph convolution according to claim 1 or 2, wherein K categories are taken as K nodes, the characteristics of the nodes are the characteristics of the pixels in the position of the characteristic graph, and the number of channels of the characteristic graph is the same as the number of characteristic dimensions of the nodes.

5. A remote sensing image multiclass information extraction system based on depth high resolution relation graph convolution is characterized by comprising the following steps:

the segmentation module is used for dividing the feature images with different dimension according to the SLIC super-pixel segmentation result of the original RGB image with the corresponding size; the specific method comprises the following steps: after the clustering number K is manually set for the images with different sizes, the original image pixel number is assumed to be N, the images are firstly divided into blocks with the same size, and the size of each block is S;

the classification module is used for dividing the feature map into K categories according to the segmentation module, and each pixel has a corresponding SLIC category;

the learning module is used for dividing images with different resolutions into two similar and dissimilar relations by calculating the similarity of corresponding features of SLIC categories, and constructing a heterogeneous graph with the two relations as edges and SLIC categories as nodes for learning;

the extraction module is used for restoring the graph neural network with different dimensions into a feature graph, and completing classification of each pixel by matching with the full-connection layer so as to extract the target type;

wherein the learning configured to learn from the heterogeneous graph with two relationships as edges and SLIC classification as nodes further comprises:

6. The remote sensing image multiclass information extraction system based on depth high resolution relational graph convolution of claim 5, whereinAnd selecting a cluster center from the divided blocks.

7. The remote sensing image multiclass information extraction system based on depth high resolution relation graph convolution of claim 6, wherein selecting a clustering center specifically comprises: in order to avoid the sampling point being at the edge or the image noise part, the point adjacent to the gradient of the pixel in the area near one sampling point needs to be manually adjusted, the smallest gradient is selected as the clustering center, and the color distance d between the pixel point and the clustering center is calculated in the range of 2S by 2S _c And a spatial distance d _s The method comprises the steps of carrying out a first treatment on the surface of the Wherein the value of the i point in the RGB coordinates is (l) _i ,a _i ,b _i ) The value of j point in RGB coordinates is (l) _j ,a _j ,b _j ) The method comprises the steps of carrying out a first treatment on the surface of the The value of the i point in the distance coordinate is (x _i ，y _i ) The value of j point in the distance coordinate is (x _j ，y _j ) The method comprises the steps of carrying out a first treatment on the surface of the Then, dc and ds are calculated as follows:

8. The remote sensing image multi-class information extraction system based on depth high resolution relation graph convolution according to claim 5 or 6, wherein K classes are used as K nodes, the characteristics of the nodes are the characteristics of the pixel in the position of the characteristic graph, and the number of channels of the characteristic graph is the same as the number of characteristic dimensions of the nodes.

9. A computer storage medium, wherein computer program instructions are stored in the computer storage medium, and when the computer program instructions are executed by a processor, the method for extracting multi-class information of remote sensing images based on deep high resolution relation graph convolution is realized according to any one of claims 1-4.