CN116612385B - Remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution - Google Patents

Remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution Download PDF

Info

Publication number
CN116612385B
CN116612385B CN202310578883.3A CN202310578883A CN116612385B CN 116612385 B CN116612385 B CN 116612385B CN 202310578883 A CN202310578883 A CN 202310578883A CN 116612385 B CN116612385 B CN 116612385B
Authority
CN
China
Prior art keywords
pixel
graph
node
value
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310578883.3A
Other languages
Chinese (zh)
Other versions
CN116612385A (en
Inventor
陈嘉辉
彭玲
王寅达
杨丽娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Information Research Institute of CAS
Original Assignee
Aerospace Information Research Institute of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Information Research Institute of CAS filed Critical Aerospace Information Research Institute of CAS
Priority to CN202310578883.3A priority Critical patent/CN116612385B/en
Publication of CN116612385A publication Critical patent/CN116612385A/en
Application granted granted Critical
Publication of CN116612385B publication Critical patent/CN116612385B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations

Abstract

The invention discloses a remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution and a computer storage medium. The method comprises the following steps: s1: dividing feature images with different dimension according to the SLIC super-pixel segmentation result of the original RGB image with corresponding size; s2: dividing the feature map to obtain K categories, wherein each pixel has a corresponding SLIC category; s3: learning by constructing resolution images of different sizes into heterogeneous graphs with two relationships as edges and SLIC classification as nodes; s4: and restoring the graph neural network with different dimensions into a feature graph, and completing classification of each pixel by matching with a full-connection layer, thereby extracting the target type. The method utilizes the high-resolution characteristics and the relation information in the generated graph structure to definitely combine the scene of the multi-label classification task with graph learning, and has a better modeling method for heterogeneous graphs with a plurality of classes.

Description

Remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution
Technical Field
The invention relates to the field of remote sensing, the field of deep learning computer vision and the field of graph learning, in particular to a method and a system for extracting multi-class information of remote sensing images based on deep high-resolution relation graph convolution.
Background
In recent years, semantic segmentation based on deep learning has become the dominant approach. Semantic segmentation is a method of classifying pixels in a picture one by one so as to extract a desired information category. When the traditional information extraction method is oriented to the remote sensing image, the traditional convolutional neural network framework is mainly relied on to extract the features, but the convolutional visual field is limited, and the feature relation of local small-range pixels is mainly explored, so that the remote dependence features of different labels for capturing the high-resolution remote sensing image under the multi-label classification task are limited.
The Full Convolutional Network (FCNs) [ Fully Convolutional Networks for Semantic Segmentation ] firstly removes the full connection layer, and introduces an end-to-end training mode for semantic segmentation. However, the downsampling profile of FCNs destroys the spatial information. The segmentation results are prone to losing boundary information. SegNet [ SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation ] utilizes encoder-decoder architecture. It upsamples the feature map using the maximum pool position index. Thus, the spatial information and high frequency characteristics lost due to the maximum pooling can be recovered. U-Net [ U-Net: convolutional Networks for Biomedical Image Segmentation ] upsamples the feature map using a leachable transpose convolution instead of interpolation. The use of a jump connection allows the decoder to know the information lost at each stage due to maximum pooling. The deep [ Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs ] series of work has focused mainly on the study of hole convolutions. The hole convolution can expand the acceptance field of the convolution kernel while maintaining the spatial resolution of the image. The Atrous Spatial Pyramid Pool (ASPP) module proposed in deep lab Semantic Image Segmentation with Deep Convolutional Nets, atrous Convolution, and Fully Connected CRFs can effectively capture multi-scale context semantic information. However, this method is not effective in small-scale object segmentation and boundary information is easily lost. To address the problem of spatial information loss, lin et al propose a refinneNet [ Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation ], which inputs each layer of the Encoder's feature map into a refine block. The refinement block may fully integrate spatial information of different resolutions. Therefore, the refinnet achieves the task of image semantic segmentation with higher accuracy.
However, convolutional Neural Networks (CNNs) can only focus on the effect of small scales on image local information. The method is difficult to capture the remote dependent features of the high-resolution remote sensing image. To overcome this problem, liang [ A Deep Neural Network Combined CNN and GCN for Remote Sensing Scene Classification ] et al introduced superpixel nodes in the image to generate a graph, but did not consider a graph model with multiple labels. In remote sensing image scene classification, li [ Multi-Label Remote Sensing Image Scene Classification by Combining a Convolutional Neural Network and a Graph Neural Network ] and the like use superpixels to generate image pattern structures. The remote advanced semantic information is combined with a graph roll-up network (GNNs), so that a good effect is achieved. However, it does not explicitly combine the scene of the multi-label classification task with the graph learning. For a heterogeneous graph with multiple classes, there is a better modeling approach.
In the invention, inspired by a High resolution network (HRNet) [ Deep High-Resolution Representation Learning for Human Pose Estimation ] and a relationship graph rolling network (R-GCN) [ Modeling Relational Data with Graph Convolutional Networks ], a remote sensing image multiclass information extraction method based on Deep High resolution relationship graph convolution is provided, and the method utilizes High resolution characteristics and relationship information in a generated graph structure.
Disclosure of Invention
In view of the above problems, the invention provides a remote sensing image multiclass information extraction method based on deep high resolution relation graph convolution, which aims to solve the problem that the conventional scene and graph learning of a multicag classification task is not clear.
In order to achieve the above object, according to one aspect of the present invention, there is provided a remote sensing image multiclass information extraction method based on depth high resolution relation graph convolution, the method comprising the steps of:
s1: dividing feature images with different dimension according to the SLIC super-pixel segmentation result of the original RGB image with corresponding size; the specific method comprises the following steps: after the clustering number K is manually set for the images with different sizes, the original image pixel number is assumed to be N, the images are firstly divided into blocks with the same size, and the size of each block is S; here, theAnd selecting a cluster center from the divided blocks. Here, in order to avoid that the sampling point is at the edge or the image noise part, it is necessary to manually adjust the point adjacent to the gradient of the pixel in the area near one sampling point, select the smallest gradient as the clustering center, and calculate the color distance d between the pixel point and the clustering center in the range of 2s×2s c And a spatial distance d s The method comprises the steps of carrying out a first treatment on the surface of the Wherein the value of the i point in the RGB coordinates is (l) i ,a i ,b i ) The value of j point in RGB coordinates is (l) j ,a j ,b j ) The method comprises the steps of carrying out a first treatment on the surface of the The value of the i point in the distance coordinate is (x i ,y i ) The value of j point in the distance coordinate is (x j ,y j ) The method comprises the steps of carrying out a first treatment on the surface of the Then, dc and ds are calculated as follows:
after the distance is calculated, each pixel point updates the image block to which the pixel point belongs, averages the pixel points of the same image block to obtain a new clustering center, and then repeats the previous steps until the distance between the two clustering centers is smaller than a set threshold value.
S2: dividing the feature map to obtain K categories, wherein each pixel has a corresponding SLIC category; at this time, the K categories are taken as K nodes, the characteristics of the nodes are the characteristics of the pixel in the position of the characteristic diagram, and the number of channels of the characteristic diagram is the same as the number of characteristic dimensions of the nodes. Edges are added between the connected categories. There are two kinds of edge categories, which are determined by calculating the similarity between nodes. The specific calculation mode is as follows:
wherein a and b are two n-dimensional vectors, a i Representing the ith dimension characteristic of node a, b i The value of n representing the i-th dimensional feature of node b depends on the number of channels of the feature map at this time. After normalization, if the value is greater than 0.5, the class 1 edge is defined, otherwise, the class 0 edge is defined, and the similar pixel set and the different pixel set are learned.
S3: aiming at images with different resolutions, the corresponding feature similarity of SLIC categories is calculated and divided into two similar and dissimilar relations, and the two relations are used as edges and SLIC categories are used as heterogeneous graphs of nodes to learn; since we divide the edges into two classes according to node similarity, we learn different feature transformation matrices according to different edge types, and the specific formulas are as follows:
wherein,is the embedding of layer i node +.>Is the embedding of layer i of the l+1 node, of->Represents the neighbor node set of the node i under the r-th relation, c i,r Is constant, & lt>Representing the relationship r at the first layerWhen r=0, i.e. +.>Representing the relation feature transformation matrix of the node to the next layer; r represents the kind of relationship, r=1 represents "similar", and r=2 represents "dissimilar".
S4: and restoring the graph neural network with different dimensions into a feature graph, and completing classification of each pixel by matching with a full-connection layer, thereby extracting the target type.
In another aspect of the present invention, a remote sensing image multiclass information extraction system based on depth high resolution relation graph convolution, the system specifically includes:
the segmentation module is used for dividing the feature images with different dimension according to the SLIC super-pixel segmentation result of the original RGB image with the corresponding size; the specific method comprises the following steps: after the clustering number K is manually set for the images with different sizes, the original image pixel number is assumed to be N, the images are firstly divided into blocks with the same size, and the size of each block is S;
the classification module is used for dividing the feature map into K categories according to the segmentation module, and each pixel has a corresponding SLIC category;
the learning module is used for dividing images with different resolutions into two similar and dissimilar relations by calculating the similarity of corresponding features of SLIC categories, and constructing a heterogeneous graph with the two relations as edges and SLIC categories as nodes for learning;
and the extraction module is used for restoring the graph neural network with different dimensions into a characteristic graph and completing classification of each pixel by matching with the full-connection layer so as to extract the target type.
In another aspect of the present invention, a computer storage medium stores computer program instructions that, when executed by a processor, implement any of the foregoing methods for extracting multi-class information from remote sensing images based on deep high resolution relational graph convolution.
The proposed method was tested on the Potsdam and Vaihingen public dataset and compared to the most advanced method, which is superior to all comparison methods under both the F1 (F1-score) and IoU (Intersection over Union) precision indexes.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is apparent that the drawings in the following description are only for the purpose of illustrating the preferred embodiments and are not to be construed as limiting the present invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 shows a schematic diagram of remote sensing image multi-class information extraction based on depth high resolution relationship graph convolution;
fig. 2 shows a class diagram obtained by dividing the feature map.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will become more apparent and fully apparent from the following detailed description of embodiments of the invention, it should be read in connection with the accompanying drawings. Based on the embodiments of the present invention, those skilled in the art may implement other embodiments without making any inventive effort, which fall within the scope of the present invention.
As shown in fig. 1, HRNet was originally proposed to solve the human posture estimation problem. The network comprises four parallel sub-networks, representing four different resolutions, respectively. Meanwhile, as the depth increases, the high-resolution subnetwork gradually joins the low-resolution subnetwork, and the multi-resolution subnetworks are connected in parallel. Unlike encoder-decoder architectures, the network does not need to upsample from the downsampled feature map. Because HRNet always maintains high-resolution feature representation, it intuitively brings more rich semantic features for semantic segmentation tasks. In the prior art, the feature images with four dimensions are finally sampled to the same dimension for prediction. The feature images with different dimension are divided according to the SLIC super-pixel segmentation result of the original RGB image with corresponding size.
The specific method comprises the following steps: after the clustering number K is manually set for images with different sizes, the original image pixel number is assumed to be N, the images are firstly divided into blocks with the same size, and the size of each block is S. Here, theAnd selecting a cluster center from the divided blocks. Here, in order to avoid that the sampling point is at the edge or the image noise part, we need to manually adjust the point adjacent to the gradient of the pixel in the area near one sampling point, select the smallest gradient as the clustering center, and calculate the color distance d between the pixel point and the clustering center in the range of 2s x 2s c And a spatial distance d s The method comprises the steps of carrying out a first treatment on the surface of the Wherein the value of the i point in the RGB coordinates is (l) i ,a i ,b i ) The value of j point in RGB coordinates is (l) j ,a j ,b j ) The method comprises the steps of carrying out a first treatment on the surface of the The value of the i point in the distance coordinate is (x i ,y i ) The value of j point in the distance coordinate is (x j ,y j ) The method comprises the steps of carrying out a first treatment on the surface of the Then, dc and ds are calculated as follows:
after the distance is calculated, each pixel point updates the image block to which the pixel point belongs, averages the pixel points of the same image block to obtain a new clustering center, and then repeats the previous steps until the distance between the two clustering centers is smaller than a set threshold value.
Thus, the segmentation feature map can thus yield K categories as in FIG. 2. Each pixel thus has a corresponding SLIC category. At this time, the K categories are taken as K nodes, the characteristics of the nodes are the characteristics of the pixel in the position of the characteristic diagram, and the number of channels of the characteristic diagram is the same as the number of characteristic dimensions of the nodes. Edges are added between the connected categories. There are two kinds of edge categories, which are determined by calculating the similarity between nodes. The specific calculation mode is as follows:
wherein a and b are two n-dimensional vectors, a i Representing the ith dimension characteristic of node a, b i The value of n representing the i-th dimensional feature of node b depends on the number of channels of the feature map at this time. After normalization, if the value is greater than 0.5, the class 1 edge is defined, otherwise, the class 0 edge is defined, and the similar pixel set and the different pixel set are learned. By constructing different sized resolution images (images) into a heterogeneous Graph (Graph) with two relationships as edges and the slec classification as nodes, there are two advantages: 1) The graph structure will extend the information update scale of the different resolution feature graphs and is not limited to the small area of the local rule. Because the convolved object is not a pixel of a fixed area, but a node obtained by slec classification with a larger field of view, each slec classification obtained node corresponds to a pixel with similar characteristics, and the distribution shape of the pixels can be irregular. In the convolution operation of the graph structure, feature learning is actually performed on a plurality of large-scale pixels. 2) The different composition based on the feature similarity can aggregate the pixels into a class set, so that the classification precision of the target pixels is improved. After the constructed graph is completed, we learn according to the multiple relationship graph neural network learning step. Since we divide the edges into two classes according to node similarity, we learn different feature transformation matrices according to different edge types, and the specific formulas are as follows:
wherein,is the embedding of layer i node +.>Is the embedding of layer i of the l+1 node, of->Represents the neighbor node set of the node i under the r-th relation, c i,r Is constant, & lt>Represents the feature transformation matrix under the relation R of the first layer, when r=0, i.e. +.>Representing the relation feature transformation matrix of the node to the next layer; r represents the kind of relationship, r=1 represents "similar", and r=2 represents "dissimilar".
And finally, restoring the graph neural network with different dimensions into a feature graph, and completing classification of each pixel by matching with a full-connection layer, so that the target type can be extracted.
In another aspect of the present invention, a remote sensing image multiclass information extraction system based on depth high resolution relation graph convolution, the system specifically includes:
the segmentation module is used for dividing the feature images with different dimension according to the SLIC super-pixel segmentation result of the original RGB image with the corresponding size; the specific method comprises the following steps: after the clustering number K is manually set for the images with different sizes, the original image pixel number is assumed to be N, the images are firstly divided into blocks with the same size, and the size of each block is S; here, the And selecting a cluster center from the divided blocks. And selecting a cluster center from the divided blocks. Here, to avoid sampling pointsIn the edge or image noise part, the adjacent point of the pixel gradient in the area near one sampling point needs to be manually adjusted, the smallest gradient is selected as the clustering center, and the color distance d between the pixel point and the clustering center is calculated in the range of 2S x 2S c And a spatial distance d s The method comprises the steps of carrying out a first treatment on the surface of the Wherein the value of the i point in the RGB coordinates is (l) i ,a i ,b i ) The value of j point in RGB coordinates is (l) j ,a j ,b j ) The method comprises the steps of carrying out a first treatment on the surface of the The value of the i point in the distance coordinate is (x i ,y i ) The value of j point in the distance coordinate is (x j ,y j ) The method comprises the steps of carrying out a first treatment on the surface of the Then, dc and ds are calculated as follows:
after the distance is calculated, each pixel point updates the image block to which the pixel point belongs, averages the pixel points of the same image block to obtain a new clustering center, and then repeats the previous steps until the distance between the two clustering centers is smaller than a set threshold value.
The classification module is used for dividing the feature map into K categories according to the segmentation module, and each pixel has a corresponding SLIC category; at this time, the K categories are taken as K nodes, the characteristics of the nodes are the characteristics of the pixel in the position of the characteristic diagram, and the number of channels of the characteristic diagram is the same as the number of characteristic dimensions of the nodes. Edges are added between the connected categories. There are two kinds of edge categories, which are determined by calculating the similarity between nodes. The specific calculation mode is as follows:
wherein a and b are two n-dimensional vectors, a i Representing the ith dimension characteristic of node a, b i Ith dimension representing node bThe value of n for a feature depends on the number of channels of the feature map at this time. After normalization, if the value is greater than 0.5, the class 1 edge is defined, otherwise, the class 0 edge is defined, and the similar pixel set and the different pixel set are learned.
The learning module is used for dividing images with different resolutions into two similar and dissimilar relations by calculating the similarity of corresponding features of SLIC categories, and constructing a heterogeneous graph with the two relations as edges and SLIC categories as nodes for learning; and taking K categories as K nodes, wherein the characteristics of the nodes are the characteristics of the pixel in the position of the characteristic diagram, and the number of channels of the characteristic diagram is the same as the number of characteristic dimensions of the nodes. Since we divide the edges into two classes according to node similarity, we learn different feature transformation matrices according to different edge types, and the specific formulas are as follows:
wherein,is the embedding of layer i node +.>Is the embedding of layer i of the l+1 node, of->Represents the neighbor node set of the node i under the r-th relation, c i,r Is constant, & lt>Represents the feature transformation matrix under the relation R of the first layer, when r=0, i.e. +.>Representing the relation feature transformation matrix of the node to the next layer; r represents the kind of relationship, r=1 represents "similar", and r=2 represents "dissimilar"”。
And the extraction module is used for restoring the graph neural network with different dimensions into a characteristic graph and completing classification of each pixel by matching with the full-connection layer so as to extract the target type.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned processes of the remote sensing image multi-type information extraction method embodiment based on the deep high resolution relation graph convolution are implemented, and the same technical effects can be achieved, so that repetition is avoided, and no further description is provided here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It should be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer storage medium. Accordingly, the present embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present embodiments have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the present application.

Claims (9)

1. A remote sensing image multiclass information extraction method based on depth high resolution relation graph convolution is characterized by comprising the following steps:
s1: dividing feature images with different dimension according to the SLIC super-pixel segmentation result of the original RGB image with corresponding size; the specific method comprises the following steps: after the clustering number K is manually set for the images with different sizes, the original image pixel number is assumed to be N, the images are firstly divided into blocks with the same size, and the size of each block is S;
s2: dividing the feature map to obtain K categories, wherein each pixel has a corresponding SLIC category;
s3: aiming at images with different resolutions, the corresponding feature similarity of SLIC categories is calculated and divided into two similar and dissimilar relations, and the two relations are used as edges and SLIC categories are used as heterogeneous graphs of nodes to learn;
s4: restoring the image neural network with different dimensions into a feature image, and completing classification of each pixel by matching with a full-connection layer so as to extract target types;
the method is characterized by comprising the following steps of constructing a heterogeneous graph with two relations as edges and SLIC classification as nodes to learn, wherein the heterogeneous graph specifically comprises the following steps:
edges are added between the connected categories, the categories of the edges are two, the edges are judged by calculating the similarity between the nodes, and the specific calculation mode is as follows:
wherein a and b are two n-dimensional vectors, a i Representing the ith dimension characteristic of node a, b i The value of n representing the i-th dimensional feature of node b depends on the number of channels of the feature map at this time; after normalization, if the value is greater than 0.5, determining the value as 1 class edge, otherwise determining the value as 0 class, and learning the similar pixel set and the different pixel sets;
the method comprises the steps of,
according to the multi-relation graph neural network learning step, the edges are divided into two types according to the node similarity, and different feature transformation matrixes are learned according to different edge types, wherein the specific formula is as follows:
wherein,is the embedding of layer i node +.>Is the embedding of layer i of the l+1 node, of->Represents the neighbor node set of the node i under the r-th relation, c i,r Is constant, & lt>Represents the feature transformation matrix under the relation R of the first layer, when r=0, i.e. +.>Representing the relation feature transformation matrix of the node to the next layer; r represents the kind of relationship, r=1 represents "similar", and r=2 represents "dissimilar".
2. The method for extracting information from multiple classes of remote sensing images based on depth high resolution relationship graph convolution as defined in claim 1, wherein S1 specifically comprises: here, theAnd selecting a cluster center from the divided blocks.
3. The remote sensing image multiclass information extraction method based on depth high resolution relation graph convolution as defined in claim 2, wherein selecting a clustering center specifically comprises: in order to avoid the sampling point being at the edge or the image noise part, the point adjacent to the gradient of the pixel in the area near one sampling point needs to be manually adjusted, the smallest gradient is selected as the clustering center, and the color distance d between the pixel point and the clustering center is calculated in the range of 2S by 2S c And a spatial distance d s The method comprises the steps of carrying out a first treatment on the surface of the Wherein the value of the i point in the RGB coordinates is (l) i ,a i ,b i ) The value of j point in RGB coordinates is (l) j ,a j ,b j ) The method comprises the steps of carrying out a first treatment on the surface of the The value of the i point in the distance coordinate is (x i ,y i ) The value of j point in the distance coordinate is (x j ,y j ) The method comprises the steps of carrying out a first treatment on the surface of the Then, dc and ds are calculatedThe formula is as follows:
after the distance is calculated, each pixel point updates the image block to which the pixel point belongs, averages the pixel points of the same image block to obtain a new clustering center, and then repeats the previous steps until the distance between the two clustering centers is smaller than a set threshold value.
4. The remote sensing image multiclass information extraction method based on depth high resolution relation graph convolution according to claim 1 or 2, wherein K categories are taken as K nodes, the characteristics of the nodes are the characteristics of the pixels in the position of the characteristic graph, and the number of channels of the characteristic graph is the same as the number of characteristic dimensions of the nodes.
5. A remote sensing image multiclass information extraction system based on depth high resolution relation graph convolution is characterized by comprising the following steps:
the segmentation module is used for dividing the feature images with different dimension according to the SLIC super-pixel segmentation result of the original RGB image with the corresponding size; the specific method comprises the following steps: after the clustering number K is manually set for the images with different sizes, the original image pixel number is assumed to be N, the images are firstly divided into blocks with the same size, and the size of each block is S;
the classification module is used for dividing the feature map into K categories according to the segmentation module, and each pixel has a corresponding SLIC category;
the learning module is used for dividing images with different resolutions into two similar and dissimilar relations by calculating the similarity of corresponding features of SLIC categories, and constructing a heterogeneous graph with the two relations as edges and SLIC categories as nodes for learning;
the extraction module is used for restoring the graph neural network with different dimensions into a feature graph, and completing classification of each pixel by matching with the full-connection layer so as to extract the target type;
wherein the learning configured to learn from the heterogeneous graph with two relationships as edges and SLIC classification as nodes further comprises:
edges are added between the connected categories, the categories of the edges are two, the edges are judged by calculating the similarity between the nodes, and the specific calculation mode is as follows:
wherein a and b are two n-dimensional vectors, a i Representing the ith dimension characteristic of node a, b i The value of n representing the i-th dimensional feature of node b depends on the number of channels of the feature map at this time; after normalization, if the value is greater than 0.5, determining the value as 1 class edge, otherwise determining the value as 0 class, and learning the similar pixel set and the different pixel sets;
according to the multi-relation graph neural network learning step, the edges are divided into two types according to the node similarity, and different feature transformation matrixes are learned according to different edge types, wherein the specific formula is as follows:
wherein,is the embedding of layer i node +.>Is the embedding of layer i of the l+1 node, of->Represents the neighbor node set of the node i under the r-th relation, c i,r Is constant, & lt>Represents the feature transformation matrix under the relation R of the first layer, when r=0, i.e. +.>Representing the relation feature transformation matrix of the node to the next layer; r represents the kind of relationship, r=1 represents "similar", and r=2 represents "dissimilar".
6. The remote sensing image multiclass information extraction system based on depth high resolution relational graph convolution of claim 5, whereinAnd selecting a cluster center from the divided blocks.
7. The remote sensing image multiclass information extraction system based on depth high resolution relation graph convolution of claim 6, wherein selecting a clustering center specifically comprises: in order to avoid the sampling point being at the edge or the image noise part, the point adjacent to the gradient of the pixel in the area near one sampling point needs to be manually adjusted, the smallest gradient is selected as the clustering center, and the color distance d between the pixel point and the clustering center is calculated in the range of 2S by 2S c And a spatial distance d s The method comprises the steps of carrying out a first treatment on the surface of the Wherein the value of the i point in the RGB coordinates is (l) i ,a i ,b i ) The value of j point in RGB coordinates is (l) j ,a j ,b j ) The method comprises the steps of carrying out a first treatment on the surface of the The value of the i point in the distance coordinate is (x i ,y i ) The value of j point in the distance coordinate is (x j ,y j ) The method comprises the steps of carrying out a first treatment on the surface of the Then, dc and ds are calculated as follows:
after the distance is calculated, each pixel point updates the image block to which the pixel point belongs, averages the pixel points of the same image block to obtain a new clustering center, and then repeats the previous steps until the distance between the two clustering centers is smaller than a set threshold value.
8. The remote sensing image multi-class information extraction system based on depth high resolution relation graph convolution according to claim 5 or 6, wherein K classes are used as K nodes, the characteristics of the nodes are the characteristics of the pixel in the position of the characteristic graph, and the number of channels of the characteristic graph is the same as the number of characteristic dimensions of the nodes.
9. A computer storage medium, wherein computer program instructions are stored in the computer storage medium, and when the computer program instructions are executed by a processor, the method for extracting multi-class information of remote sensing images based on deep high resolution relation graph convolution is realized according to any one of claims 1-4.
CN202310578883.3A 2023-05-22 2023-05-22 Remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution Active CN116612385B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310578883.3A CN116612385B (en) 2023-05-22 2023-05-22 Remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310578883.3A CN116612385B (en) 2023-05-22 2023-05-22 Remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution

Publications (2)

Publication Number Publication Date
CN116612385A CN116612385A (en) 2023-08-18
CN116612385B true CN116612385B (en) 2024-01-26

Family

ID=87683052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310578883.3A Active CN116612385B (en) 2023-05-22 2023-05-22 Remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution

Country Status (1)

Country Link
CN (1) CN116612385B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364730A (en) * 2020-10-29 2021-02-12 济南大学 Hyperspectral ground object automatic classification method and system based on sparse subspace clustering
WO2023077816A1 (en) * 2021-11-03 2023-05-11 中国华能集团清洁能源技术研究院有限公司 Boundary-optimized remote sensing image semantic segmentation method and apparatus, and device and medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364730A (en) * 2020-10-29 2021-02-12 济南大学 Hyperspectral ground object automatic classification method and system based on sparse subspace clustering
WO2023077816A1 (en) * 2021-11-03 2023-05-11 中国华能集团清洁能源技术研究院有限公司 Boundary-optimized remote sensing image semantic segmentation method and apparatus, and device and medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
改进的SLIC算法在彩色图像分割中的应用;郭艳婕;杨明;侯宇超;;重庆理工大学学报(自然科学)(第02期);全文 *
简单线性迭代聚类的高分辨率遥感影像分割;董志鹏;梅小明;陈杰;邓敏;李昕;;遥感信息(第06期);全文 *
结合SLIC和模糊聚类的遥感图像分割方法;杨丽艳;赵玉娥;黄亮;;软件(第12期);全文 *

Also Published As

Publication number Publication date
CN116612385A (en) 2023-08-18

Similar Documents

Publication Publication Date Title
CN111539370B (en) Image pedestrian re-identification method and system based on multi-attention joint learning
CN110176027B (en) Video target tracking method, device, equipment and storage medium
CN106845487B (en) End-to-end license plate identification method
CN109583340B (en) Video target detection method based on deep learning
CN110866896B (en) Image saliency target detection method based on k-means and level set super-pixel segmentation
CN110334762B (en) Feature matching method based on quad tree combined with ORB and SIFT
WO2016159199A1 (en) Method for re-identification of objects
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
CN109086777B (en) Saliency map refining method based on global pixel characteristics
CN107564009B (en) Outdoor scene multi-target segmentation method based on deep convolutional neural network
CN113011329A (en) Pyramid network based on multi-scale features and dense crowd counting method
CN113870335A (en) Monocular depth estimation method based on multi-scale feature fusion
CN110310305B (en) Target tracking method and device based on BSSD detection and Kalman filtering
Chen et al. Dr-tanet: Dynamic receptive temporal attention network for street scene change detection
CN109657538B (en) Scene segmentation method and system based on context information guidance
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN114445651A (en) Training set construction method and device of semantic segmentation model and electronic equipment
CN113850136A (en) Yolov5 and BCNN-based vehicle orientation identification method and system
CN115410030A (en) Target detection method, target detection device, computer equipment and storage medium
CN115018039A (en) Neural network distillation method, target detection method and device
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN116229406B (en) Lane line detection method, system, electronic equipment and storage medium
CN113066089A (en) Real-time image semantic segmentation network based on attention guide mechanism
CN113096032A (en) Non-uniform blur removing method based on image area division
CN116612385B (en) Remote sensing image multiclass information extraction method and system based on depth high-resolution relation graph convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant