CN116612385B - Remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution - Google Patents
Remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution Download PDFInfo
- Publication number
- CN116612385B CN116612385B CN202310578883.3A CN202310578883A CN116612385B CN 116612385 B CN116612385 B CN 116612385B CN 202310578883 A CN202310578883 A CN 202310578883A CN 116612385 B CN116612385 B CN 116612385B
- Authority
- CN
- China
- Prior art keywords
- pixel
- graph
- node
- value
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 42
- 230000011218 segmentation Effects 0.000 claims abstract description 22
- 238000013528 artificial neural network Methods 0.000 claims abstract description 12
- 238000003860 storage Methods 0.000 claims abstract description 9
- 230000009466 transformation Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 11
- 238000005070 sampling Methods 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 239000013598 vector Substances 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 16
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000003709 image segmentation Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000012633 leachable Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/176—Urban or other man-made structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution and a computer storage medium. The method comprises the following steps: s1: dividing feature images with different dimension according to the SLIC super-pixel segmentation result of the original RGB image with corresponding size; s2: dividing the feature map to obtain K categories, wherein each pixel has a corresponding SLIC category; s3: learning by constructing resolution images of different sizes into heterogeneous graphs with two relationships as edges and SLIC classification as nodes; s4: and restoring the graph neural network with different dimensions into a feature graph, and completing classification of each pixel by matching with a full-connection layer, thereby extracting the target type. The method utilizes the high-resolution characteristics and the relation information in the generated graph structure to definitely combine the scene of the multi-label classification task with graph learning, and has a better modeling method for heterogeneous graphs with a plurality of classes.
Description
Technical Field
The invention relates to the field of remote sensing, the field of deep learning computer vision and the field of graph learning, in particular to a method and a system for extracting multi-class information of remote sensing images based on deep high-resolution relation graph convolution.
Background
In recent years, semantic segmentation based on deep learning has become the dominant approach. Semantic segmentation is a method of classifying pixels in a picture one by one so as to extract a desired information category. When the traditional information extraction method is oriented to the remote sensing image, the traditional convolutional neural network framework is mainly relied on to extract the features, but the convolutional visual field is limited, and the feature relation of local small-range pixels is mainly explored, so that the remote dependence features of different labels for capturing the high-resolution remote sensing image under the multi-label classification task are limited.
The Full Convolutional Network (FCNs) [ Fully Convolutional Networks for Semantic Segmentation ] firstly removes the full connection layer, and introduces an end-to-end training mode for semantic segmentation. However, the downsampling profile of FCNs destroys the spatial information. The segmentation results are prone to losing boundary information. SegNet [ SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation ] utilizes encoder-decoder architecture. It upsamples the feature map using the maximum pool position index. Thus, the spatial information and high frequency characteristics lost due to the maximum pooling can be recovered. U-Net [ U-Net: convolutional Networks for Biomedical Image Segmentation ] upsamples the feature map using a leachable transpose convolution instead of interpolation. The use of a jump connection allows the decoder to know the information lost at each stage due to maximum pooling. The deep [ Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs ] series of work has focused mainly on the study of hole convolutions. The hole convolution can expand the acceptance field of the convolution kernel while maintaining the spatial resolution of the image. The Atrous Spatial Pyramid Pool (ASPP) module proposed in deep lab Semantic Image Segmentation with Deep Convolutional Nets, atrous Convolution, and Fully Connected CRFs can effectively capture multi-scale context semantic information. However, this method is not effective in small-scale object segmentation and boundary information is easily lost. To address the problem of spatial information loss, lin et al propose a refinneNet [ Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation ], which inputs each layer of the Encoder's feature map into a refine block. The refinement block may fully integrate spatial information of different resolutions. Therefore, the refinnet achieves the task of image semantic segmentation with higher accuracy.
However, convolutional Neural Networks (CNNs) can only focus on the effect of small scales on image local information. The method is difficult to capture the remote dependent features of the high-resolution remote sensing image. To overcome this problem, liang [ A Deep Neural Network Combined CNN and GCN for Remote Sensing Scene Classification ] et al introduced superpixel nodes in the image to generate a graph, but did not consider a graph model with multiple labels. In remote sensing image scene classification, li [ Multi-Label Remote Sensing Image Scene Classification by Combining a Convolutional Neural Network and a Graph Neural Network ] and the like use superpixels to generate image pattern structures. The remote advanced semantic information is combined with a graph roll-up network (GNNs), so that a good effect is achieved. However, it does not explicitly combine the scene of the multi-label classification task with the graph learning. For a heterogeneous graph with multiple classes, there is a better modeling approach.
In the invention, inspired by a High resolution network (HRNet) [ Deep High-Resolution Representation Learning for Human Pose Estimation ] and a relationship graph rolling network (R-GCN) [ Modeling Relational Data with Graph Convolutional Networks ], a remote sensing image multiclass information extraction method based on Deep High resolution relationship graph convolution is provided, and the method utilizes High resolution characteristics and relationship information in a generated graph structure.
Disclosure of Invention
In view of the above problems, the invention provides a remote sensing image multiclass information extraction method based on deep high resolution relation graph convolution, which aims to solve the problem that the conventional scene and graph learning of a multicag classification task is not clear.
In order to achieve the above object, according to one aspect of the present invention, there is provided a remote sensing image multiclass information extraction method based on depth high resolution relation graph convolution, the method comprising the steps of:
s1: dividing feature images with different dimension according to the SLIC super-pixel segmentation result of the original RGB image with corresponding size; the specific method comprises the following steps: after the clustering number K is manually set for the images with different sizes, the original image pixel number is assumed to be N, the images are firstly divided into blocks with the same size, and the size of each block is S; here, theAnd selecting a cluster center from the divided blocks. Here, in order to avoid that the sampling point is at the edge or the image noise part, it is necessary to manually adjust the point adjacent to the gradient of the pixel in the area near one sampling point, select the smallest gradient as the clustering center, and calculate the color distance d between the pixel point and the clustering center in the range of 2s×2s c And a spatial distance d s The method comprises the steps of carrying out a first treatment on the surface of the Wherein the value of the i point in the RGB coordinates is (l) i ,a i ,b i ) The value of j point in RGB coordinates is (l) j ,a j ,b j ) The method comprises the steps of carrying out a first treatment on the surface of the The value of the i point in the distance coordinate is (x i ,y i ) The value of j point in the distance coordinate is (x j ,y j ) The method comprises the steps of carrying out a first treatment on the surface of the Then, dc and ds are calculated as follows:
after the distance is calculated, each pixel point updates the image block to which the pixel point belongs, averages the pixel points of the same image block to obtain a new clustering center, and then repeats the previous steps until the distance between the two clustering centers is smaller than a set threshold value.
S2: dividing the feature map to obtain K categories, wherein each pixel has a corresponding SLIC category; at this time, the K categories are taken as K nodes, the characteristics of the nodes are the characteristics of the pixel in the position of the characteristic diagram, and the number of channels of the characteristic diagram is the same as the number of characteristic dimensions of the nodes. Edges are added between the connected categories. There are two kinds of edge categories, which are determined by calculating the similarity between nodes. The specific calculation mode is as follows:
wherein a and b are two n-dimensional vectors, a i Representing the ith dimension characteristic of node a, b i The value of n representing the i-th dimensional feature of node b depends on the number of channels of the feature map at this time. After normalization, if the value is greater than 0.5, the class 1 edge is defined, otherwise, the class 0 edge is defined, and the similar pixel set and the different pixel set are learned.
S3: aiming at images with different resolutions, the corresponding feature similarity of SLIC categories is calculated and divided into two similar and dissimilar relations, and the two relations are used as edges and SLIC categories are used as heterogeneous graphs of nodes to learn; since we divide the edges into two classes according to node similarity, we learn different feature transformation matrices according to different edge types, and the specific formulas are as follows:
wherein,is the embedding of layer i node +.>Is the embedding of layer i of the l+1 node, of->Represents the neighbor node set of the node i under the r-th relation, c i,r Is constant, & lt>Representing the relationship r at the first layerWhen r=0, i.e. +.>Representing the relation feature transformation matrix of the node to the next layer; r represents the kind of relationship, r=1 represents "similar", and r=2 represents "dissimilar".
S4: and restoring the graph neural network with different dimensions into a feature graph, and completing classification of each pixel by matching with a full-connection layer, thereby extracting the target type.
In another aspect of the present invention, a remote sensing image multiclass information extraction system based on depth high resolution relation graph convolution, the system specifically includes:
the segmentation module is used for dividing the feature images with different dimension according to the SLIC super-pixel segmentation result of the original RGB image with the corresponding size; the specific method comprises the following steps: after the clustering number K is manually set for the images with different sizes, the original image pixel number is assumed to be N, the images are firstly divided into blocks with the same size, and the size of each block is S;
the classification module is used for dividing the feature map into K categories according to the segmentation module, and each pixel has a corresponding SLIC category;
the learning module is used for dividing images with different resolutions into two similar and dissimilar relations by calculating the similarity of corresponding features of SLIC categories, and constructing a heterogeneous graph with the two relations as edges and SLIC categories as nodes for learning;
and the extraction module is used for restoring the graph neural network with different dimensions into a characteristic graph and completing classification of each pixel by matching with the full-connection layer so as to extract the target type.
In another aspect of the present invention, a computer storage medium stores computer program instructions that, when executed by a processor, implement any of the foregoing methods for extracting multi-class information from remote sensing images based on deep high resolution relational graph convolution.
The proposed method was tested on the Potsdam and Vaihingen public dataset and compared to the most advanced method, which is superior to all comparison methods under both the F1 (F1-score) and IoU (Intersection over Union) precision indexes.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is apparent that the drawings in the following description are only for the purpose of illustrating the preferred embodiments and are not to be construed as limiting the present invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 shows a schematic diagram of remote sensing image multi-class information extraction based on depth high resolution relationship graph convolution;
fig. 2 shows a class diagram obtained by dividing the feature map.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will become more apparent and fully apparent from the following detailed description of embodiments of the invention, it should be read in connection with the accompanying drawings. Based on the embodiments of the present invention, those skilled in the art may implement other embodiments without making any inventive effort, which fall within the scope of the present invention.
As shown in fig. 1, HRNet was originally proposed to solve the human posture estimation problem. The network comprises four parallel sub-networks, representing four different resolutions, respectively. Meanwhile, as the depth increases, the high-resolution subnetwork gradually joins the low-resolution subnetwork, and the multi-resolution subnetworks are connected in parallel. Unlike encoder-decoder architectures, the network does not need to upsample from the downsampled feature map. Because HRNet always maintains high-resolution feature representation, it intuitively brings more rich semantic features for semantic segmentation tasks. In the prior art, the feature images with four dimensions are finally sampled to the same dimension for prediction. The feature images with different dimension are divided according to the SLIC super-pixel segmentation result of the original RGB image with corresponding size.
The specific method comprises the following steps: after the clustering number K is manually set for images with different sizes, the original image pixel number is assumed to be N, the images are firstly divided into blocks with the same size, and the size of each block is S. Here, theAnd selecting a cluster center from the divided blocks. Here, in order to avoid that the sampling point is at the edge or the image noise part, we need to manually adjust the point adjacent to the gradient of the pixel in the area near one sampling point, select the smallest gradient as the clustering center, and calculate the color distance d between the pixel point and the clustering center in the range of 2s x 2s c And a spatial distance d s The method comprises the steps of carrying out a first treatment on the surface of the Wherein the value of the i point in the RGB coordinates is (l) i ,a i ,b i ) The value of j point in RGB coordinates is (l) j ,a j ,b j ) The method comprises the steps of carrying out a first treatment on the surface of the The value of the i point in the distance coordinate is (x i ,y i ) The value of j point in the distance coordinate is (x j ,y j ) The method comprises the steps of carrying out a first treatment on the surface of the Then, dc and ds are calculated as follows:
after the distance is calculated, each pixel point updates the image block to which the pixel point belongs, averages the pixel points of the same image block to obtain a new clustering center, and then repeats the previous steps until the distance between the two clustering centers is smaller than a set threshold value.
Thus, the segmentation feature map can thus yield K categories as in FIG. 2. Each pixel thus has a corresponding SLIC category. At this time, the K categories are taken as K nodes, the characteristics of the nodes are the characteristics of the pixel in the position of the characteristic diagram, and the number of channels of the characteristic diagram is the same as the number of characteristic dimensions of the nodes. Edges are added between the connected categories. There are two kinds of edge categories, which are determined by calculating the similarity between nodes. The specific calculation mode is as follows:
wherein a and b are two n-dimensional vectors, a i Representing the ith dimension characteristic of node a, b i The value of n representing the i-th dimensional feature of node b depends on the number of channels of the feature map at this time. After normalization, if the value is greater than 0.5, the class 1 edge is defined, otherwise, the class 0 edge is defined, and the similar pixel set and the different pixel set are learned. By constructing different sized resolution images (images) into a heterogeneous Graph (Graph) with two relationships as edges and the slec classification as nodes, there are two advantages: 1) The graph structure will extend the information update scale of the different resolution feature graphs and is not limited to the small area of the local rule. Because the convolved object is not a pixel of a fixed area, but a node obtained by slec classification with a larger field of view, each slec classification obtained node corresponds to a pixel with similar characteristics, and the distribution shape of the pixels can be irregular. In the convolution operation of the graph structure, feature learning is actually performed on a plurality of large-scale pixels. 2) The different composition based on the feature similarity can aggregate the pixels into a class set, so that the classification precision of the target pixels is improved. After the constructed graph is completed, we learn according to the multiple relationship graph neural network learning step. Since we divide the edges into two classes according to node similarity, we learn different feature transformation matrices according to different edge types, and the specific formulas are as follows:
wherein,is the embedding of layer i node +.>Is the embedding of layer i of the l+1 node, of->Represents the neighbor node set of the node i under the r-th relation, c i,r Is constant, & lt>Represents the feature transformation matrix under the relation R of the first layer, when r=0, i.e. +.>Representing the relation feature transformation matrix of the node to the next layer; r represents the kind of relationship, r=1 represents "similar", and r=2 represents "dissimilar".
And finally, restoring the graph neural network with different dimensions into a feature graph, and completing classification of each pixel by matching with a full-connection layer, so that the target type can be extracted.
In another aspect of the present invention, a remote sensing image multiclass information extraction system based on depth high resolution relation graph convolution, the system specifically includes:
the segmentation module is used for dividing the feature images with different dimension according to the SLIC super-pixel segmentation result of the original RGB image with the corresponding size; the specific method comprises the following steps: after the clustering number K is manually set for the images with different sizes, the original image pixel number is assumed to be N, the images are firstly divided into blocks with the same size, and the size of each block is S; here, the And selecting a cluster center from the divided blocks. And selecting a cluster center from the divided blocks. Here, to avoid sampling pointsIn the edge or image noise part, the adjacent point of the pixel gradient in the area near one sampling point needs to be manually adjusted, the smallest gradient is selected as the clustering center, and the color distance d between the pixel point and the clustering center is calculated in the range of 2S x 2S c And a spatial distance d s The method comprises the steps of carrying out a first treatment on the surface of the Wherein the value of the i point in the RGB coordinates is (l) i ,a i ,b i ) The value of j point in RGB coordinates is (l) j ,a j ,b j ) The method comprises the steps of carrying out a first treatment on the surface of the The value of the i point in the distance coordinate is (x i ,y i ) The value of j point in the distance coordinate is (x j ,y j ) The method comprises the steps of carrying out a first treatment on the surface of the Then, dc and ds are calculated as follows:
after the distance is calculated, each pixel point updates the image block to which the pixel point belongs, averages the pixel points of the same image block to obtain a new clustering center, and then repeats the previous steps until the distance between the two clustering centers is smaller than a set threshold value.
The classification module is used for dividing the feature map into K categories according to the segmentation module, and each pixel has a corresponding SLIC category; at this time, the K categories are taken as K nodes, the characteristics of the nodes are the characteristics of the pixel in the position of the characteristic diagram, and the number of channels of the characteristic diagram is the same as the number of characteristic dimensions of the nodes. Edges are added between the connected categories. There are two kinds of edge categories, which are determined by calculating the similarity between nodes. The specific calculation mode is as follows:
wherein a and b are two n-dimensional vectors, a i Representing the ith dimension characteristic of node a, b i Ith dimension representing node bThe value of n for a feature depends on the number of channels of the feature map at this time. After normalization, if the value is greater than 0.5, the class 1 edge is defined, otherwise, the class 0 edge is defined, and the similar pixel set and the different pixel set are learned.
The learning module is used for dividing images with different resolutions into two similar and dissimilar relations by calculating the similarity of corresponding features of SLIC categories, and constructing a heterogeneous graph with the two relations as edges and SLIC categories as nodes for learning; and taking K categories as K nodes, wherein the characteristics of the nodes are the characteristics of the pixel in the position of the characteristic diagram, and the number of channels of the characteristic diagram is the same as the number of characteristic dimensions of the nodes. Since we divide the edges into two classes according to node similarity, we learn different feature transformation matrices according to different edge types, and the specific formulas are as follows:
wherein,is the embedding of layer i node +.>Is the embedding of layer i of the l+1 node, of->Represents the neighbor node set of the node i under the r-th relation, c i,r Is constant, & lt>Represents the feature transformation matrix under the relation R of the first layer, when r=0, i.e. +.>Representing the relation feature transformation matrix of the node to the next layer; r represents the kind of relationship, r=1 represents "similar", and r=2 represents "dissimilar"”。
And the extraction module is used for restoring the graph neural network with different dimensions into a characteristic graph and completing classification of each pixel by matching with the full-connection layer so as to extract the target type.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned processes of the remote sensing image multi-type information extraction method embodiment based on the deep high resolution relation graph convolution are implemented, and the same technical effects can be achieved, so that repetition is avoided, and no further description is provided here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It should be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer storage medium. Accordingly, the present embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present embodiments have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the present application.
Claims (9)
1. A remote sensing image multiclass information extraction method based on depth high resolution relation graph convolution is characterized by comprising the following steps:
s1: dividing feature images with different dimension according to the SLIC super-pixel segmentation result of the original RGB image with corresponding size; the specific method comprises the following steps: after the clustering number K is manually set for the images with different sizes, the original image pixel number is assumed to be N, the images are firstly divided into blocks with the same size, and the size of each block is S;
s2: dividing the feature map to obtain K categories, wherein each pixel has a corresponding SLIC category;
s3: aiming at images with different resolutions, the corresponding feature similarity of SLIC categories is calculated and divided into two similar and dissimilar relations, and the two relations are used as edges and SLIC categories are used as heterogeneous graphs of nodes to learn;
s4: restoring the image neural network with different dimensions into a feature image, and completing classification of each pixel by matching with a full-connection layer so as to extract target types;
the method is characterized by comprising the following steps of constructing a heterogeneous graph with two relations as edges and SLIC classification as nodes to learn, wherein the heterogeneous graph specifically comprises the following steps:
edges are added between the connected categories, the categories of the edges are two, the edges are judged by calculating the similarity between the nodes, and the specific calculation mode is as follows:
wherein a and b are two n-dimensional vectors, a i Representing the ith dimension characteristic of node a, b i The value of n representing the i-th dimensional feature of node b depends on the number of channels of the feature map at this time; after normalization, if the value is greater than 0.5, determining the value as 1 class edge, otherwise determining the value as 0 class, and learning the similar pixel set and the different pixel sets;
the method comprises the steps of,
according to the multi-relation graph neural network learning step, the edges are divided into two types according to the node similarity, and different feature transformation matrixes are learned according to different edge types, wherein the specific formula is as follows:
wherein,is the embedding of layer i node +.>Is the embedding of layer i of the l+1 node, of->Represents the neighbor node set of the node i under the r-th relation, c i,r Is constant, & lt>Represents the feature transformation matrix under the relation R of the first layer, when r=0, i.e. +.>Representing the relation feature transformation matrix of the node to the next layer; r represents the kind of relationship, r=1 represents "similar", and r=2 represents "dissimilar".
2. The method for extracting information from multiple classes of remote sensing images based on depth high resolution relationship graph convolution as defined in claim 1, wherein S1 specifically comprises: here, theAnd selecting a cluster center from the divided blocks.
3. The remote sensing image multiclass information extraction method based on depth high resolution relation graph convolution as defined in claim 2, wherein selecting a clustering center specifically comprises: in order to avoid the sampling point being at the edge or the image noise part, the point adjacent to the gradient of the pixel in the area near one sampling point needs to be manually adjusted, the smallest gradient is selected as the clustering center, and the color distance d between the pixel point and the clustering center is calculated in the range of 2S by 2S c And a spatial distance d s The method comprises the steps of carrying out a first treatment on the surface of the Wherein the value of the i point in the RGB coordinates is (l) i ,a i ,b i ) The value of j point in RGB coordinates is (l) j ,a j ,b j ) The method comprises the steps of carrying out a first treatment on the surface of the The value of the i point in the distance coordinate is (x i ,y i ) The value of j point in the distance coordinate is (x j ,y j ) The method comprises the steps of carrying out a first treatment on the surface of the Then, dc and ds are calculatedThe formula is as follows:
after the distance is calculated, each pixel point updates the image block to which the pixel point belongs, averages the pixel points of the same image block to obtain a new clustering center, and then repeats the previous steps until the distance between the two clustering centers is smaller than a set threshold value.
4. The remote sensing image multiclass information extraction method based on depth high resolution relation graph convolution according to claim 1 or 2, wherein K categories are taken as K nodes, the characteristics of the nodes are the characteristics of the pixels in the position of the characteristic graph, and the number of channels of the characteristic graph is the same as the number of characteristic dimensions of the nodes.
5. A remote sensing image multiclass information extraction system based on depth high resolution relation graph convolution is characterized by comprising the following steps:
the segmentation module is used for dividing the feature images with different dimension according to the SLIC super-pixel segmentation result of the original RGB image with the corresponding size; the specific method comprises the following steps: after the clustering number K is manually set for the images with different sizes, the original image pixel number is assumed to be N, the images are firstly divided into blocks with the same size, and the size of each block is S;
the classification module is used for dividing the feature map into K categories according to the segmentation module, and each pixel has a corresponding SLIC category;
the learning module is used for dividing images with different resolutions into two similar and dissimilar relations by calculating the similarity of corresponding features of SLIC categories, and constructing a heterogeneous graph with the two relations as edges and SLIC categories as nodes for learning;
the extraction module is used for restoring the graph neural network with different dimensions into a feature graph, and completing classification of each pixel by matching with the full-connection layer so as to extract the target type;
wherein the learning configured to learn from the heterogeneous graph with two relationships as edges and SLIC classification as nodes further comprises:
edges are added between the connected categories, the categories of the edges are two, the edges are judged by calculating the similarity between the nodes, and the specific calculation mode is as follows:
wherein a and b are two n-dimensional vectors, a i Representing the ith dimension characteristic of node a, b i The value of n representing the i-th dimensional feature of node b depends on the number of channels of the feature map at this time; after normalization, if the value is greater than 0.5, determining the value as 1 class edge, otherwise determining the value as 0 class, and learning the similar pixel set and the different pixel sets;
according to the multi-relation graph neural network learning step, the edges are divided into two types according to the node similarity, and different feature transformation matrixes are learned according to different edge types, wherein the specific formula is as follows:
wherein,is the embedding of layer i node +.>Is the embedding of layer i of the l+1 node, of->Represents the neighbor node set of the node i under the r-th relation, c i,r Is constant, & lt>Represents the feature transformation matrix under the relation R of the first layer, when r=0, i.e. +.>Representing the relation feature transformation matrix of the node to the next layer; r represents the kind of relationship, r=1 represents "similar", and r=2 represents "dissimilar".
6. The remote sensing image multiclass information extraction system based on depth high resolution relational graph convolution of claim 5, whereinAnd selecting a cluster center from the divided blocks.
7. The remote sensing image multiclass information extraction system based on depth high resolution relation graph convolution of claim 6, wherein selecting a clustering center specifically comprises: in order to avoid the sampling point being at the edge or the image noise part, the point adjacent to the gradient of the pixel in the area near one sampling point needs to be manually adjusted, the smallest gradient is selected as the clustering center, and the color distance d between the pixel point and the clustering center is calculated in the range of 2S by 2S c And a spatial distance d s The method comprises the steps of carrying out a first treatment on the surface of the Wherein the value of the i point in the RGB coordinates is (l) i ,a i ,b i ) The value of j point in RGB coordinates is (l) j ,a j ,b j ) The method comprises the steps of carrying out a first treatment on the surface of the The value of the i point in the distance coordinate is (x i ,y i ) The value of j point in the distance coordinate is (x j ,y j ) The method comprises the steps of carrying out a first treatment on the surface of the Then, dc and ds are calculated as follows:
after the distance is calculated, each pixel point updates the image block to which the pixel point belongs, averages the pixel points of the same image block to obtain a new clustering center, and then repeats the previous steps until the distance between the two clustering centers is smaller than a set threshold value.
8. The remote sensing image multi-class information extraction system based on depth high resolution relation graph convolution according to claim 5 or 6, wherein K classes are used as K nodes, the characteristics of the nodes are the characteristics of the pixel in the position of the characteristic graph, and the number of channels of the characteristic graph is the same as the number of characteristic dimensions of the nodes.
9. A computer storage medium, wherein computer program instructions are stored in the computer storage medium, and when the computer program instructions are executed by a processor, the method for extracting multi-class information of remote sensing images based on deep high resolution relation graph convolution is realized according to any one of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310578883.3A CN116612385B (en) | 2023-05-22 | 2023-05-22 | Remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310578883.3A CN116612385B (en) | 2023-05-22 | 2023-05-22 | Remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116612385A CN116612385A (en) | 2023-08-18 |
CN116612385B true CN116612385B (en) | 2024-01-26 |
Family
ID=87683052
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310578883.3A Active CN116612385B (en) | 2023-05-22 | 2023-05-22 | Remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116612385B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118446938B (en) * | 2024-07-08 | 2024-09-10 | 浙江国遥地理信息技术有限公司 | Shadow area restoration method and device for remote sensing image and electronic equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112364730A (en) * | 2020-10-29 | 2021-02-12 | 济南大学 | Hyperspectral ground object automatic classification method and system based on sparse subspace clustering |
WO2023077816A1 (en) * | 2021-11-03 | 2023-05-11 | 中国华能集团清洁能源技术研究院有限公司 | Boundary-optimized remote sensing image semantic segmentation method and apparatus, and device and medium |
-
2023
- 2023-05-22 CN CN202310578883.3A patent/CN116612385B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112364730A (en) * | 2020-10-29 | 2021-02-12 | 济南大学 | Hyperspectral ground object automatic classification method and system based on sparse subspace clustering |
WO2023077816A1 (en) * | 2021-11-03 | 2023-05-11 | 中国华能集团清洁能源技术研究院有限公司 | Boundary-optimized remote sensing image semantic segmentation method and apparatus, and device and medium |
Non-Patent Citations (3)
Title |
---|
改进的SLIC算法在彩色图像分割中的应用;郭艳婕;杨明;侯宇超;;重庆理工大学学报(自然科学)(第02期);全文 * |
简单线性迭代聚类的高分辨率遥感影像分割;董志鹏;梅小明;陈杰;邓敏;李昕;;遥感信息(第06期);全文 * |
结合SLIC和模糊聚类的遥感图像分割方法;杨丽艳;赵玉娥;黄亮;;软件(第12期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN116612385A (en) | 2023-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111539370B (en) | Image pedestrian re-identification method and system based on multi-attention joint learning | |
CN106845487B (en) | End-to-end license plate identification method | |
CN109583340B (en) | Video target detection method based on deep learning | |
CN113870335B (en) | Monocular depth estimation method based on multi-scale feature fusion | |
CN110334762B (en) | Feature matching method based on quad tree combined with ORB and SIFT | |
CN108256562A (en) | Well-marked target detection method and system based on Weakly supervised space-time cascade neural network | |
CN109086777B (en) | Saliency map refining method based on global pixel characteristics | |
WO2016159199A1 (en) | Method for re-identification of objects | |
CN111639564B (en) | Video pedestrian re-identification method based on multi-attention heterogeneous network | |
Chen et al. | Dr-tanet: Dynamic receptive temporal attention network for street scene change detection | |
CN113011329A (en) | Pyramid network based on multi-scale features and dense crowd counting method | |
CN112132844A (en) | Recursive non-local self-attention image segmentation method based on lightweight | |
CN107564009B (en) | Outdoor scene multi-target segmentation method based on deep convolutional neural network | |
CN110866896A (en) | Image saliency target detection method based on k-means and level set super-pixel segmentation | |
CN110310305B (en) | Target tracking method and device based on BSSD detection and Kalman filtering | |
CN116612385B (en) | Remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution | |
CN115410030A (en) | Target detection method, target detection device, computer equipment and storage medium | |
CN109657538B (en) | Scene segmentation method and system based on context information guidance | |
CN110852327A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN113066089A (en) | Real-time image semantic segmentation network based on attention guide mechanism | |
CN113850136A (en) | Yolov5 and BCNN-based vehicle orientation identification method and system | |
CN116681742A (en) | Visible light and infrared thermal imaging image registration method based on graph neural network | |
CN115018039A (en) | Neural network distillation method, target detection method and device | |
CN116863194A (en) | Foot ulcer image classification method, system, equipment and medium | |
CN113344947B (en) | Super-pixel aggregation segmentation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |