CN116486169B - Remote sensing image target motion direction discriminating method - Google Patents

Remote sensing image target motion direction discriminating method Download PDF

Info

Publication number
CN116486169B
CN116486169B CN202310477115.9A CN202310477115A CN116486169B CN 116486169 B CN116486169 B CN 116486169B CN 202310477115 A CN202310477115 A CN 202310477115A CN 116486169 B CN116486169 B CN 116486169B
Authority
CN
China
Prior art keywords
segmentation
attention
semantic
target
remote sensing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310477115.9A
Other languages
Chinese (zh)
Other versions
CN116486169A (en
Inventor
李梓桢
金世超
贺广均
冯鹏铭
梁颖
陈千千
上官博屹
常江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Satellite Information Engineering
Original Assignee
Beijing Institute of Satellite Information Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Satellite Information Engineering filed Critical Beijing Institute of Satellite Information Engineering
Priority to CN202310477115.9A priority Critical patent/CN116486169B/en
Publication of CN116486169A publication Critical patent/CN116486169A/en
Application granted granted Critical
Publication of CN116486169B publication Critical patent/CN116486169B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Animal Behavior & Ethology (AREA)

Abstract

The invention relates to a remote sensing image target movement direction distinguishing method, which comprises the following steps: s100, carrying out panoramic segmentation labeling and target behavior movement direction labeling on a remote sensing image, and determining a target movement direction knowledge graph; s200, a panoramic segmentation model is established based on a remote sensing image, a ResNet is used as a feature to extract a backbone network, and a cross attention module is introduced to extract long-distance context information; s300, performing instance-level segmentation on the remote sensing image according to instance segmentation branches in the panoramic segmentation network, and performing semantic-level segmentation on the remote sensing image according to semantic segmentation branches in the panoramic segmentation network; s400, introducing a Bayesian decision-based branch fusion module, and carrying out decision fusion on the results of the example segmentation branches and the semantic segmentation branches to generate a panoramic segmentation image; s500, carrying out pixel clustering on the panoramic segmented image to generate a scene information knowledge graph; s600, according to the graph attention network, the attention target in the scene information knowledge graph is dynamically judged. The invention can infer the behavior movement information of the target in the remote sensing image.

Description

Remote sensing image target motion direction discriminating method
Technical Field
The invention relates to the technical field of remote sensing, in particular to a remote sensing image target movement direction distinguishing method.
Background
The target movement direction of the high-resolution remote sensing image is judged, and the movement direction of the target such as taxiing, taking-off, parking and the like is judged according to the explicit or potential interaction state between the target and the environment or other targets by the pointer on the target such as plane, ship, vehicle and the like in the remote sensing image.
The first method adopts a manual judgment mode, which requires related personnel to have extremely high knowledge and experience reserves, but has the problems of poor real-time performance, high labor and time consumption, high cost, susceptibility to subjective judgment and the like. The other is a discrimination method based on time sequence images, although the method has been successful in the automatic driving field, the remote sensing image is different from the automatic driving image, and it is difficult to obtain a continuous time sequence image as the automatic driving image, and meanwhile, the remote sensing image has the problems of small target scale, complex scene environment and the like, so that the motion discrimination method in the intelligent driving field cannot be utilized.
The key features of the ground object targets are extracted from the high-resolution remote sensing images in an intelligent mode and are inferred by utilizing the remote sensing image panoramic segmentation and graph annotation meaning network inference technology, so that the behavior trend judgment of the high-resolution remote sensing image attention targets is realized, and important theoretical research and application values are achieved.
Disclosure of Invention
In view of this, the present invention aims to provide a method for discriminating the target motion direction of a remote sensing image, which obtains scene and target information by performing panorama segmentation on the remote sensing image, generates a knowledge graph of the scene information by using a knowledge graph generation method, and performs knowledge reasoning by using a graph attention network to obtain behavior motion direction information of the target.
The remote sensing image target motion direction judging method provided by the embodiment of the invention comprises the following steps:
s100, carrying out panoramic segmentation labeling and target behavior movement direction labeling on the remote sensing image, and determining a target movement direction knowledge graph according to a labeling result;
s200, a panoramic segmentation model is established based on the remote sensing image, a ResNet is used as a feature to extract a backbone network, and a cross attention module is introduced to extract long-distance context information;
s300, performing instance-level segmentation on the remote sensing image according to instance segmentation branches in a panoramic segmentation network, and performing semantic-level segmentation on the remote sensing image according to semantic segmentation branches in the panoramic segmentation network;
s400, introducing a Bayesian decision-based branch fusion module, and carrying out decision fusion on the results of the example segmentation branches and the semantic segmentation branches to generate a panoramic segmentation image;
s500, carrying out pixel clustering on the panoramic segmentation image to generate a scene information knowledge graph;
and S600, according to the graph attention network, judging the direction of the attention target in the scene information knowledge graph.
Preferably, the S100 includes:
s110, carrying out instance-level labeling on a foreground target and carrying out semantic-level labeling on a background target;
s120, additionally labeling behavior movement information for the concerned target;
s130, a knowledge graph generation method based on pixel clustering takes panoramic segmentation images and target behavior movement information as input to generate a target movement knowledge graph.
Preferably, in S110:
performing instance-level labeling on the foreground targets, including pixel-level classification on the foreground targets according to semantic categories, and dividing the instances so that different foreground targets in the same semantic category have different instance numbers;
carrying out semantic level labeling on background targets, including carrying out pixel level classification on different background targets according to semantic categories;
the S130 includes:
recording the number of semantic categories in the panoramic segmentation image as N, the number of instance objects as M and each pixel point P x,y Represented by quadruples (x, y, cls, ins), where x and y represent pixel point P x,y Cls=1, 2,..n represents pixel point P x,y Is=1, 2,..m represents pixel point P x,y Is then:
s131, creating an empty knowledge graph G (V, E), wherein V, E are respectively a set of nodes and edges, and the initial states of V, E are all empty;
s132, selecting any pixel point P x,y For the current cluster focus, a cluster v is created i ={P x,y |ins∈P x,y Ins=i }, and v is added to the knowledge graph G i Will cluster the focus P x,y Category and behavior of (v) as v i The attributes of (2) are also added into the knowledge graph G;
s133, traversing clustering focus point p x,y Adjacent pixel point P of (a) x′,y′ If pixel point P x′,y′ The clustering of the matrix satisfies v j ={P x,y |ins∈P x,y Ins=j } adds it to cluster v i Otherwise, creating cluster v j And adding a node v into the knowledge graph G j Node v j Category and behavior of (v) as v j The attribute is also added into the knowledge graph G to generate a v i And v j Edge e between ij
S134, the pixel point P x′,y′ Instead of the original clustered focus P x,y As a new clustering focus, repeating S133 until all the pixels belong to one cluster;
s135, taking the obtained knowledge graph G (V, E) as a target trend knowledge graph, wherein each node V has a category and a behavior trend as attributes.
Preferably, the S200 includes:
s210, establishing a panoramic segmentation model based on the remote sensing image, extracting a backbone network by taking ResNet as a characteristic, and marking the extracted characteristic map as A epsilon R C×H×W Wherein C is the number of channels of the feature map, and H×W is the size of the feature map;
s220, respectively carrying out channel compression on the feature map A by using 1X 1 convolution to obtain a feature map P E R 1×H×W And Q.epsilon.R 1 ×H×W
S230, for each pixel u in the feature map P, calculating the corresponding pixel μ in the feature map Q 1 Pixel point mu in same row and column 2 ,μ 3 ,...,μ H×W-1 Is related to relation e of (2) ui (i=1, 2,., h×w-1) and normalized using softmax to obtain the cross-attention x of pixel u ui
And according to cross-attention x ui Determining a cross-attention matrix X ε R (H×W-1)×H×W
S240, generating a feature map E R from the cross attention matrix X and the feature map A C×H×W As output of the cross-attention module:
E=ω(X×A)+A
where ω is a learnable weight parameter.
Preferably, the S300 includes:
based on the outputs E E R of the cross-attention modules C×H×W The example segmentation branch utilizes region suggestion, carries out detection frame regression on a candidate region, generates the prediction probability of a target class, and generates a segmentation mask through a convolution layer in a detection frame to obtain example level segmentation of the remote sensing image;
based on the outputs E E R of the cross-attention modules C×H×W The semantic segmentation branch is utilizedThe convolution layer extracts semantic features in the remote sensing image, and a segmentation mask is generated for each semantic category to obtain semantic level segmentation of the remote sensing image.
Preferably, the S400 includes:
s410, recording N semantic categories as { cls ] 1 ,cls 2 ,...,cls N The classification result of the instance segmentation branch and the semantic segmentation branch on a certain pixel point is Y= { Y } 1 ,y 2 Y is k The posterior probabilities belonging to any category are:
wherein,is the posterior probability of the instance segmentation branch and the semantic segmentation branch for different semantic categories;
s420, for each pixel point, calculating joint probability distribution under the Bayesian theory as follows:
P(cls i |Y)=P(cls i |y 1 )P(cls i |y 2 )
s430, determining the pixel category according to the maximum posterior probability criterion, and obtaining the final semantic category to which the pixel belongs as follows:
Class(Y)=argmax i (P(cls i |Y)),i=1,2,...,N
wherein class (Y) is the class to which pixel Y belongs;
s440, fusion prediction conflict pixels are used for Bayesian decision, and the rest non-conflict predictions are combined, so that the panoramic segmentation image is obtained.
Preferably, the S500 includes:
and using the panoramic segmentation image as input, and generating a scene information knowledge graph which does not contain target behavior movement information according to the method of the S130.
Preferably, the S600 includes:
s610, taking the scene information knowledge graph as input, and inputting a graph attention network;
s620, for each graph node in the scene information knowledge graph, calculating the node v i In the neighborhood N i Inner and every other node v j Is used as the attention coefficient e ij The method comprises the following steps:
e ij =σ(a(Wh i ||Wh j ))
wherein h is i And h j Respectively node v i And v j Is a weight vector and a weight matrix which can be learned, I is a splicing operation, sigma is a nonlinear function, and N is a neighborhood i Is a first order neighbor or a second order neighbor;
s630, normalized with softmax, the attention matrix is obtained as:
s640, introducing a multi-head attention mechanism, and using K independent attention heads, obtaining attention characteristics as follows:
wherein h' i Representing node v i Via the output characteristics of the multi-headed attention mechanism,and W is k An attention matrix and a weight matrix represented on an attention header k;
s650, extracting characteristic information of the scene information knowledge graph by using the graph attention network, and reasoning behavior movement information of the attention target.
According to the embodiment of the invention, the target motion knowledge graph is quickly generated by utilizing the panoramic segmentation image of the high-resolution remote sensing image by designing the knowledge graph generation method based on pixel clustering; by introducing the cross attention module, the long-distance context information extraction capability of the feature extraction network is enhanced; respectively obtaining instance-level segmentation and semantic-level segmentation in the high-resolution remote sensing image by constructing a panoramic segmentation model; the problem of conflict when two branch results are fused is solved by introducing a branch fusion module based on Bayesian decision, and a high-precision panoramic segmentation result is obtained, so that the aim of scene segmentation is fulfilled; by constructing a graph attention network, the explicit and potential association relationship between targets in a focus attention scene and between targets and environments is realized, and behavior trend reasoning of the focus attention target is realized; according to the remote sensing image target movement direction judging method provided by the embodiment of the invention, the scene environment of the cognition important attention area can be rapidly understood, the movement direction of the attention target can be intelligently inferred, and the target movement direction judgment of the high-resolution remote sensing image with high timeliness, high precision and high intellectualization can be realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a remote sensing image target motion direction judging method according to an embodiment of the invention;
fig. 2 is a schematic diagram of a remote sensing image target motion direction discrimination method according to an embodiment of the present invention;
fig. 3 (1) - (3) are schematic diagrams of links of the remote sensing image target motion direction determining method in the data generating stage according to the embodiment of the present invention;
FIG. 4 is a schematic sub-flowchart of a remote sensing image target motion direction determination method according to an embodiment of the present invention;
FIG. 5 is a schematic view of a panoramic segmentation model according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a cross-attention module according to an embodiment of the present invention.
Detailed Description
The description of the embodiments of this specification should be taken in conjunction with the accompanying drawings, which are a complete description of the embodiments. In the drawings, the shape or thickness of the embodiments may be enlarged and indicated simply or conveniently. Furthermore, portions of the structures in the drawings will be described in terms of separate descriptions, and it should be noted that elements not shown or described in the drawings are in a form known to those of ordinary skill in the art.
Any references to directions and orientations in the description of the embodiments herein are for convenience only and should not be construed as limiting the scope of the invention in any way. The following description of the preferred embodiments will refer to combinations of features, which may be present alone or in combination, and the invention is not particularly limited to the preferred embodiments. The scope of the invention is defined by the claims.
As shown in fig. 1 and fig. 2, the remote sensing image target trend discrimination method in the embodiment of the invention uses a knowledge graph generation method based on pixel clustering as a core, organically combines high-resolution remote sensing image panoramic segmentation with graph annotation meaning network reasoning, and can be divided into three stages of data generation, panoramic segmentation and knowledge reasoning. The data generation stage comprises a step S100, the panoramic segmentation stage comprises steps S200-S500, and the knowledge reasoning stage comprises a step S600.
As shown in fig. 2 and 3, in the data generation stage, the high-resolution remote sensing image panorama segmentation labels are taken as input, which include both pixel-level classification of the image and instance-level segmentation of the "foreground" object of interest. For the concerned target in the panoramic segmented image, additional target behavior movement direction labeling is needed, and then a target movement direction knowledge graph is generated by using a knowledge graph generation method based on pixel clustering. Fig. 3 is an illustration of an airport aircraft target for ease of understanding, but is not intended to limit the present embodiment.
As shown in fig. 1, the data generation phase includes:
and S100, carrying out panoramic segmentation labeling and target behavior movement direction labeling on the remote sensing image, and determining a target movement direction knowledge graph according to a labeling result.
As shown in fig. 4, in the present embodiment, S100 specifically includes:
s110, performing instance-level labeling on the foreground target and performing semantic-level labeling on the background target.
The foreground objects are subjected to instance-level labeling, including pixel-level classification according to semantic categories, and instances are divided, so that different foreground objects in the same semantic category have different instance numbers, and typical foreground objects include aircrafts, vehicles or ships and the like.
Semantic-level labeling of background objects, including pixel-level classification of different background objects according to semantic categories, background objects are often not counted, typical background objects are tarmac, runway or vegetation, etc.
S120, behavior movement information is additionally marked on the concerned target.
Taking the example that the target of interest is an aircraft, the behavior movement information may include movement information of behaviors such as take-off, landing, taxiing, and berthing, and the behavior movement information of other targets (i.e., targets not of great interest) and the background defaults to null.
S130, a knowledge graph generation method based on pixel clustering takes panoramic segmentation images and target behavior movement information as input to generate a target movement knowledge graph.
In this embodiment, a brand new knowledge graph generation method based on pixel clustering is introduced, which specifically includes:
recording the number of semantic categories in the panoramic segmented image as N, the number of instance objects as M (the same background category belongs to the same instance object), and each pixel point P x,y Represented by quadruples (x, y, cls, ins), where x and y represent pixel point P x,y Cls=1, 2,..n represents pixel point P x,y Is=1, 2,..m represents pixel point P x,y Is then:
s131, creating an empty knowledge graph G (V, E), wherein V, E are respectively a set of nodes and edges, and the initial states of V, E are all empty;
s132, selecting any pixel point P x,y For the current cluster focus, a cluster v is created i ={P x,y |insP x,y Ins=i }, and v is added to the knowledge graph G i Will cluster the focus P x,y Category and behavior of (v) as v i The attributes of (2) are also added into the knowledge graph G;
s133, traversing clustering attention point P X,y Adjacent pixel point P of (a) x′,y′ If pixel point P x′,y′ The clustering of the matrix satisfies v j ={P x,y |ins∈P x,y Ins=j } adds it to cluster v i Otherwise, creating cluster v j And adding a node v into the knowledge graph G j Node v j Category and behavior of (v) as v j The attribute is also added into the knowledge graph G to generate a v i And v j Edge e between ij
S134, the pixel point P x′,y′ Instead of the original clustered focus P x,y As a new clustering focus, repeating S133 until all the pixels belong to one cluster;
s135, taking the obtained knowledge graph G (V, E) as a target trend knowledge graph, wherein each node V has a category and a behavior trend as attributes.
As shown in fig. 5, in the panorama segmentation stage, a panorama segmentation model needs to be constructed, including a feature extraction network including a cross-attention module, an instance segmentation branch, a semantic segmentation branch, and a bayesian decision-based result fusion module.
As shown in fig. 1, in the present embodiment, the panorama segmentation stage includes:
s200, a panoramic segmentation model is established based on the remote sensing image, a ResNet is used as a feature to extract a backbone network, and a cross attention module is introduced to extract long-distance context information.
As shown in fig. 6, in this embodiment, specifically, it includes:
s210, establishing a panoramic segmentation model based on the remote sensing image, and extracting a backbone network by taking ResNet as a characteristicThe obtained characteristic graph is marked as A epsilon R C×H×W Wherein C is the number of channels of the feature map, and H×W is the feature map size;
s220, respectively carrying out channel compression on the feature map A by using 1X 1 convolution to obtain a feature map P E R 1×H×W And Q.epsilon.R 1 ×H×W
S230, for each pixel u in the feature map P, calculating the corresponding pixel μ in the feature map Q 1 Pixel point mu in same row and column 2 ,μ 3 ,...,μ H×W-1 Is related to relation e of (2) ui (i=1, 2,., h×w-1) and normalized using softmax to obtain the cross-attention x of pixel u ui
And according to cross-attention x ui Determining a cross-attention matrix X ε R (H×W-1)×H×W
S240, generating a feature map E R from the cross attention matrix X and the feature map A C×H×W As output of the cross-attention module:
E=ω(X×A)+A
where ω is a learnable weight parameter.
S300, performing instance-level segmentation on the remote sensing image according to instance segmentation branches in the panoramic segmentation network, and performing semantic-level segmentation on the remote sensing image according to semantic segmentation branches in the panoramic segmentation network.
In this embodiment, the method specifically includes:
based on the outputs E E R of the cross-attention modules C×H×W The example segmentation branch utilizes the regional suggestion to carry out detection frame regression on the candidate region, generates the prediction probability of the target category, and generates a segmentation mask through a convolution layer in the detection frame to obtain the example level segmentation of the remote sensing image;
based on the outputs E E R of the cross-attention modules C×H×W Semantic segmentation branches extract semantic features in remote sensing images by using convolution layers, and the semantic features are classified for each semantic typeAnd generating a segmentation mask respectively to obtain semantic level segmentation of the remote sensing image.
S400, introducing a Bayesian decision-based branch fusion module, and carrying out decision fusion on the results of the example segmentation branches and the semantic segmentation branches to generate a panoramic segmentation image.
The bayesian decision fusion is directed to the portion where the example segmentation branch and the semantic segmentation branch result collide, and in this embodiment, the bayesian decision fusion specifically includes:
s410, recording N semantic categories as { cls ] 1 ,cls 2 ,...,cls N The classification result of the instance segmentation branch and the semantic segmentation branch on a certain pixel point is Y= { Y } 1 ,y 2 Y is k The posterior probabilities belonging to any category are:
wherein,is the posterior probability of the instance segmentation branch and the semantic segmentation branch for different semantic categories;
s420, for each pixel point, calculating joint probability distribution under the Bayesian theory as follows:
P(cls i |Y)=P(cls i |y 1 )P(cls i |y 2 )
s430, determining the pixel category according to the maximum posterior probability criterion, and obtaining the final semantic category to which the pixel belongs as follows:
Class(Y)=argmax i (P(cls i |Y)),i=1,2,...,N
wherein class (Y) is the class to which pixel Y belongs;
s440, fusion prediction conflict pixels are used for Bayesian decision, and the rest non-conflict predictions are combined, so that the panoramic segmentation image is obtained.
S500, carrying out pixel clustering on the panoramic segmentation image to generate a scene information knowledge graph.
In this embodiment, the input of this step is a panoramic segmented image of the high-resolution remote sensing image generated through bayesian decision fusion, and according to the same method as step S130, a scene information knowledge map that does not include the target behavior movement information is generated.
As shown in fig. 1, in the present embodiment, the knowledge reasoning stage includes:
and S600, according to the graph attention network, judging the direction of the attention target in the scene information knowledge graph.
In this embodiment, the method specifically includes:
s610, taking the scene information knowledge graph as input, and inputting a graph attention network;
s620, for each graph node in the scene information knowledge graph, calculating the node v i In the neighborhood N i Inner and every other node v j Is used as the attention coefficient e ij The method comprises the following steps:
e ij =σ(a(Wh i ||Wh i ))
wherein h is i And h j Respectively node v i And v j Is a weight vector and a weight matrix which can be learned, I is a splicing operation, sigma is a nonlinear function, and N is a neighborhood i Is a first order neighbor or a second order neighbor;
s630, to increase the comparability of the attention values among different nodes, normalization is performed using softmax, and the attention matrix is obtained as follows:
s640, in order to obtain a more stable learning effect, a multi-head attention mechanism is introduced, and K independent attention heads are used to obtain attention characteristics as follows:
wherein h' i Representing node v i Via the output characteristics of the multi-headed attention mechanism,and W is k An attention matrix and a weight matrix represented on an attention header k;
s650, extracting characteristic information of the scene information knowledge graph by using the graph attention network, and reasoning behavior movement information of the attention target.
In summary, according to the remote sensing image target trend discrimination method, a knowledge graph generation method based on pixel clustering is designed, and a target trend knowledge graph is rapidly generated by utilizing a panoramic segmentation image of a high-resolution remote sensing image; by introducing the cross attention module, the long-distance context information extraction capability of the feature extraction network is enhanced; respectively obtaining instance-level segmentation and semantic-level segmentation in the high-resolution remote sensing image by constructing a panoramic segmentation model; the problem of conflict when two branch results are fused is solved by introducing a branch fusion module based on Bayesian decision, and a high-precision panoramic segmentation result is obtained, so that the aim of scene segmentation is fulfilled; by constructing a graph attention network, the explicit and potential association relationship between targets in a focus attention scene and between targets and environments is realized, and behavior trend reasoning of the focus attention target is realized; according to the remote sensing image target movement direction judging method provided by the embodiment of the invention, the scene environment of the cognition important attention area can be rapidly understood, the movement direction of the attention target can be intelligently inferred, and the target movement direction judging of the high-resolution remote sensing image with high timeliness, high precision and high intellectualization can be realized
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (7)

1. The remote sensing image target movement direction distinguishing method is characterized by comprising the following steps of:
s100, carrying out panoramic segmentation labeling and target behavior movement direction labeling on the remote sensing image, and determining a target movement direction knowledge graph according to a labeling result;
s200, a panoramic segmentation model is established based on the remote sensing image, a ResNet is used as a feature to extract a backbone network, and a cross attention module is introduced to extract long-distance context information;
s300, performing instance-level segmentation on the remote sensing image according to instance segmentation branches in a panoramic segmentation network, and performing semantic-level segmentation on the remote sensing image according to semantic segmentation branches in the panoramic segmentation network;
s400, introducing a Bayesian decision-based branch fusion module, and carrying out decision fusion on the results of the example segmentation branches and the semantic segmentation branches to generate a panoramic segmentation image;
s500, carrying out pixel clustering on the panoramic segmentation image to generate a scene information knowledge graph;
s600, according to the graph attention network, judging the direction of the attention target in the scene information knowledge graph;
wherein, the S100 includes:
s110, carrying out instance-level labeling on a foreground target and carrying out semantic-level labeling on a background target;
s120, additionally labeling behavior movement information for the concerned target;
s130, a knowledge graph generation method based on pixel clustering takes panoramic segmentation images and target behavior movement information as input to generate a target movement knowledge graph.
2. The method according to claim 1, wherein in S110:
performing instance-level labeling on the foreground targets, including pixel-level classification on the foreground targets according to semantic categories, and dividing the instances so that different foreground targets in the same semantic category have different instance numbers;
carrying out semantic level labeling on background targets, including carrying out pixel level classification on different background targets according to semantic categories;
the S130 includes:
recording the number of semantic categories in the panoramic segmentation image as N, the number of instance objects as M and each pixel point P x,y Represented by quadruples (x, y, cls, ins), where x and y represent pixel point P x,y Cls=1, 2,..n represents pixel point P x,y Is=1, 2,..m represents pixel point P x,y Is then:
s131, creating an empty knowledge graph G (V, E), wherein V, E are respectively a set of nodes and edges, and the initial states of V, E are all empty;
s132, selecting any pixel point P x,y For the current cluster focus, a cluster v is created i ={P x,y |ins∈P x,y Ins=i }, and adding λv to the knowledge graph G i Will cluster the focus P x,y Category and behavior of (v) as v i The attributes of (2) are also added into the knowledge graph G;
s133, traversing clustering attention point P x,y Adjacent pixel point P of (a) x′,y′ If pixel point P x′,y′ The clustering of the matrix satisfies v j ={P x,y |ins∈P x,y Ins=j } adds it to cluster v i Otherwise, creating cluster v j And adding a node v into the knowledge graph G j Node v j Category and behavior of (v) as v j The attribute is also added into the knowledge graph G to generate a v i And v j Edge e between ij
S134, the pixel point P x′,y′ Instead of the original clustered focus P x,y As a new clustering focus, repeating S133 until all the pixels belong to one cluster;
s135, taking the obtained knowledge graph G (V, E) as a target trend knowledge graph, wherein each node V has a category and a behavior trend as attributes.
3. The method according to claim 2, wherein S200 includes:
s210, establishing a panoramic segmentation model based on the remote sensing image, extracting a backbone network by taking ResNet as a characteristic, and marking the extracted characteristic map as A epsilon R C×H×W Wherein C is the number of channels of the feature map, and H×W is the size of the feature map;
s220, respectively carrying out channel compression on the feature map A by using 1X 1 convolution to obtain a feature map P E R 1×H×W And Q.epsilon.R 1×H×W
S230, for each pixel u in the feature map P, calculating the corresponding pixel μ in the feature map Q 1 Pixel point mu in same row and column 2 ,μ 3 ,...,μ H×W-1 Is related to relation e of (2) ui (i=1, 2,., h×w-1) and normalized using softmax to obtain the cross-attention x of pixel u ui
And according to cross-attention x ui Determining a cross-attention matrix X ε R (H×W-1)×H×W
S240, generating a feature map E R from the cross attention matrix X and the feature map A C×H×W As output of the cross-attention module:
E=ω(X×A)+A
where ω is a learnable weight parameter.
4. The method according to claim 3, wherein the step S300 includes:
based on the outputs E E R of the cross-attention modules C×H×W The example segmentation branch utilizes region suggestion, carries out detection frame regression on a candidate region, generates the prediction probability of a target class, and generates a segmentation mask through a convolution layer in a detection frame to obtain example level segmentation of the remote sensing image;
based on the outputs E E R of the cross-attention modules C×H×W The semantic division branching utilizes volumesAnd extracting semantic features in the remote sensing image by lamination, and generating a segmentation mask for each semantic category to obtain semantic level segmentation of the remote sensing image.
5. The method of claim 4, wherein S400 includes:
s410, recording N semantic categories as { cls ] 1 ,cls 2 ,...,cls N The classification result of the instance segmentation branch and the semantic segmentation branch on a certain pixel point is Y= { Y } 1 ,y 2 Y is k The posterior probabilities belonging to any category are:
wherein,is the posterior probability of the instance segmentation branch and the semantic segmentation branch for different semantic categories;
s420, for each pixel point, calculating joint probability distribution under the Bayesian theory as follows:
P(cls i |lY)=P(cls i |y 1 )P(cls i |y 2 )
s430, determining the pixel category according to the maximum posterior probability criterion, and obtaining the final semantic category to which the pixel belongs as follows:
Class(Y)=argmax i (P(cls i |Y)),i=1,2,...,N
wherein Class (Y) is the Class to which pixel Y belongs;
s440, fusion prediction conflict pixels are used for Bayesian decision, and the rest non-conflict predictions are combined, so that the panoramic segmentation image is obtained.
6. The method of claim 5, wherein S500 includes:
and using the panoramic segmentation image as input, and generating a scene information knowledge graph which does not contain target behavior movement information according to the method of the S130.
7. The method according to claim 5, wherein the step S600 includes:
s610, taking the scene information knowledge graph as input, and inputting a graph attention network;
s620, for each graph node in the scene information knowledge graph, calculating the node v i In the neighborhood N i Inner and every other node v j Is used as the attention coefficient e ij The method comprises the following steps:
e ij =σ(a(Wh i ||Wh j ))
wherein h is i And h j Respectively node v i And v j Is a weight vector and a weight matrix which can be learned, I is a splicing operation, sigma is a nonlinear function, and N is a neighborhood i Is a first order neighbor or a second order neighbor;
s630, normalized with softmax, the attention matrix is obtained as:
s640, introducing a multi-head attention mechanism, and using K independent attention heads, obtaining attention characteristics as follows:
wherein h' i Representing node v i Via the output characteristics of the multi-headed attention mechanism,and W is k Representation ofAn attention matrix and a weight matrix on the attention header k;
s650, extracting characteristic information of the scene information knowledge graph by using the graph attention network, and reasoning behavior movement information of the attention target.
CN202310477115.9A 2023-04-27 2023-04-27 Remote sensing image target motion direction discriminating method Active CN116486169B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310477115.9A CN116486169B (en) 2023-04-27 2023-04-27 Remote sensing image target motion direction discriminating method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310477115.9A CN116486169B (en) 2023-04-27 2023-04-27 Remote sensing image target motion direction discriminating method

Publications (2)

Publication Number Publication Date
CN116486169A CN116486169A (en) 2023-07-25
CN116486169B true CN116486169B (en) 2023-12-19

Family

ID=87211645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310477115.9A Active CN116486169B (en) 2023-04-27 2023-04-27 Remote sensing image target motion direction discriminating method

Country Status (1)

Country Link
CN (1) CN116486169B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103561261A (en) * 2013-10-12 2014-02-05 重庆邮电大学 Panoramic locatable video coding method based on visual attention
CN113780149A (en) * 2021-09-07 2021-12-10 北京航空航天大学 Method for efficiently extracting building target of remote sensing image based on attention mechanism
CN115100652A (en) * 2022-08-02 2022-09-23 北京卫星信息工程研究所 Electronic map automatic generation method based on high-resolution remote sensing image
CN115115939A (en) * 2022-07-28 2022-09-27 北京卫星信息工程研究所 Remote sensing image target fine-grained identification method based on characteristic attention mechanism
CN115908908A (en) * 2022-11-14 2023-04-04 北京卫星信息工程研究所 Remote sensing image gathering type target identification method and device based on graph attention network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10335091B2 (en) * 2014-03-19 2019-07-02 Tactonic Technologies, Llc Method and apparatus to infer object and agent properties, activity capacities, behaviors, and intents from contact and pressure images
US11030486B2 (en) * 2018-04-20 2021-06-08 XNOR.ai, Inc. Image classification through label progression

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103561261A (en) * 2013-10-12 2014-02-05 重庆邮电大学 Panoramic locatable video coding method based on visual attention
CN113780149A (en) * 2021-09-07 2021-12-10 北京航空航天大学 Method for efficiently extracting building target of remote sensing image based on attention mechanism
CN115115939A (en) * 2022-07-28 2022-09-27 北京卫星信息工程研究所 Remote sensing image target fine-grained identification method based on characteristic attention mechanism
CN115100652A (en) * 2022-08-02 2022-09-23 北京卫星信息工程研究所 Electronic map automatic generation method based on high-resolution remote sensing image
CN115908908A (en) * 2022-11-14 2023-04-04 北京卫星信息工程研究所 Remote sensing image gathering type target identification method and device based on graph attention network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
深度卷积神经网络图像语义分割研究进展;青晨;禹晶;肖创柏;段娟;;中国图象图形学报(第06期);全文 *

Also Published As

Publication number Publication date
CN116486169A (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN110276765B (en) Image panorama segmentation method based on multitask learning deep neural network
CN109685067B (en) Image semantic segmentation method based on region and depth residual error network
CN110313017B (en) Machine vision method for classifying input data based on object components
CN111462282A (en) Scene graph generation method
US8170283B2 (en) Video surveillance system configured to analyze complex behaviors using alternating layers of clustering and sequencing
CN110633632A (en) Weak supervision combined target detection and semantic segmentation method based on loop guidance
CN110580499B (en) Deep learning target detection method and system based on crowdsourcing repeated labels
Li et al. Transmission line detection in aerial images: An instance segmentation approach based on multitask neural networks
CN112966555A (en) Remote sensing image airplane identification method based on deep learning and component prior
CN114972758B (en) Instance segmentation method based on point cloud weak supervision
CN112766409A (en) Feature fusion method for remote sensing image target detection
Wang et al. Joint-learning segmentation in Internet of drones (IoD)-based monitor systems
CN115984537A (en) Image processing method and device and related equipment
CN116363374A (en) Image semantic segmentation network continuous learning method, system, equipment and storage medium
CN116863384A (en) CNN-Transfomer-based self-supervision video segmentation method and system
CN117690098B (en) Multi-label identification method based on dynamic graph convolution under open driving scene
Farah et al. Interpretation of multisensor remote sensing images: Multiapproach fusion of uncertain information
Ahmadi et al. A hybrid of inference and stacked classifiers to indoor scenes classification of rgb-d images
Yasmin et al. Small obstacles detection on roads scenes using semantic segmentation for the safe navigation of autonomous vehicles
CN116486169B (en) Remote sensing image target motion direction discriminating method
Saravanarajan et al. Improving semantic segmentation under hazy weather for autonomous vehicles using explainable artificial intelligence and adaptive dehazing approach
Koch et al. Estimating Object Perception Performance in Aerial Imagery Using a Bayesian Approach
Li et al. Real-time 3D object proposal generation and classification using limited processing resources
CN114821188A (en) Image processing method, training method of scene graph generation model and electronic equipment
Ho NBDT: Neural-backed decision trees

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant