CN116486169B

CN116486169B - Remote sensing image target motion direction discriminating method

Info

Publication number: CN116486169B
Application number: CN202310477115.9A
Authority: CN
Inventors: 李梓桢; 金世超; 贺广均; 冯鹏铭; 梁颖; 陈千千; 上官博屹; 常江
Original assignee: Beijing Institute of Satellite Information Engineering
Current assignee: Beijing Institute of Satellite Information Engineering
Priority date: 2023-04-27
Filing date: 2023-04-27
Publication date: 2023-12-19
Anticipated expiration: 2043-04-27
Also published as: CN116486169A

Abstract

The invention relates to a remote sensing image target movement direction distinguishing method, which comprises the following steps: s100, carrying out panoramic segmentation labeling and target behavior movement direction labeling on a remote sensing image, and determining a target movement direction knowledge graph; s200, a panoramic segmentation model is established based on a remote sensing image, a ResNet is used as a feature to extract a backbone network, and a cross attention module is introduced to extract long-distance context information; s300, performing instance-level segmentation on the remote sensing image according to instance segmentation branches in the panoramic segmentation network, and performing semantic-level segmentation on the remote sensing image according to semantic segmentation branches in the panoramic segmentation network; s400, introducing a Bayesian decision-based branch fusion module, and carrying out decision fusion on the results of the example segmentation branches and the semantic segmentation branches to generate a panoramic segmentation image; s500, carrying out pixel clustering on the panoramic segmented image to generate a scene information knowledge graph; s600, according to the graph attention network, the attention target in the scene information knowledge graph is dynamically judged. The invention can infer the behavior movement information of the target in the remote sensing image.

Description

Remote sensing image target motion direction discriminating method

Technical Field

The invention relates to the technical field of remote sensing, in particular to a remote sensing image target movement direction distinguishing method.

Background

The target movement direction of the high-resolution remote sensing image is judged, and the movement direction of the target such as taxiing, taking-off, parking and the like is judged according to the explicit or potential interaction state between the target and the environment or other targets by the pointer on the target such as plane, ship, vehicle and the like in the remote sensing image.

The first method adopts a manual judgment mode, which requires related personnel to have extremely high knowledge and experience reserves, but has the problems of poor real-time performance, high labor and time consumption, high cost, susceptibility to subjective judgment and the like. The other is a discrimination method based on time sequence images, although the method has been successful in the automatic driving field, the remote sensing image is different from the automatic driving image, and it is difficult to obtain a continuous time sequence image as the automatic driving image, and meanwhile, the remote sensing image has the problems of small target scale, complex scene environment and the like, so that the motion discrimination method in the intelligent driving field cannot be utilized.

The key features of the ground object targets are extracted from the high-resolution remote sensing images in an intelligent mode and are inferred by utilizing the remote sensing image panoramic segmentation and graph annotation meaning network inference technology, so that the behavior trend judgment of the high-resolution remote sensing image attention targets is realized, and important theoretical research and application values are achieved.

Disclosure of Invention

In view of this, the present invention aims to provide a method for discriminating the target motion direction of a remote sensing image, which obtains scene and target information by performing panorama segmentation on the remote sensing image, generates a knowledge graph of the scene information by using a knowledge graph generation method, and performs knowledge reasoning by using a graph attention network to obtain behavior motion direction information of the target.

The remote sensing image target motion direction judging method provided by the embodiment of the invention comprises the following steps:

s100, carrying out panoramic segmentation labeling and target behavior movement direction labeling on the remote sensing image, and determining a target movement direction knowledge graph according to a labeling result;

s200, a panoramic segmentation model is established based on the remote sensing image, a ResNet is used as a feature to extract a backbone network, and a cross attention module is introduced to extract long-distance context information;

s300, performing instance-level segmentation on the remote sensing image according to instance segmentation branches in a panoramic segmentation network, and performing semantic-level segmentation on the remote sensing image according to semantic segmentation branches in the panoramic segmentation network;

s400, introducing a Bayesian decision-based branch fusion module, and carrying out decision fusion on the results of the example segmentation branches and the semantic segmentation branches to generate a panoramic segmentation image;

s500, carrying out pixel clustering on the panoramic segmentation image to generate a scene information knowledge graph;

and S600, according to the graph attention network, judging the direction of the attention target in the scene information knowledge graph.

Preferably, the S100 includes:

s110, carrying out instance-level labeling on a foreground target and carrying out semantic-level labeling on a background target;

s120, additionally labeling behavior movement information for the concerned target;

s130, a knowledge graph generation method based on pixel clustering takes panoramic segmentation images and target behavior movement information as input to generate a target movement knowledge graph.

Preferably, in S110:

performing instance-level labeling on the foreground targets, including pixel-level classification on the foreground targets according to semantic categories, and dividing the instances so that different foreground targets in the same semantic category have different instance numbers;

carrying out semantic level labeling on background targets, including carrying out pixel level classification on different background targets according to semantic categories;

the S130 includes:

recording the number of semantic categories in the panoramic segmentation image as N, the number of instance objects as M and each pixel point P _x，y Represented by quadruples (x, y, cls, ins), where x and y represent pixel point P _x，y Cls=1, 2,..n represents pixel point P _x，y Is=1, 2,..m represents pixel point P _x，y Is then:

s131, creating an empty knowledge graph G (V, E), wherein V, E are respectively a set of nodes and edges, and the initial states of V, E are all empty;

s132, selecting any pixel point P _x，y For the current cluster focus, a cluster v is created _i ＝{P _x，y |ins∈P _x，y Ins=i }, and v is added to the knowledge graph G _i Will cluster the focus P _x，y Category and behavior of (v) as v _i The attributes of (2) are also added into the knowledge graph G;

s133, traversing clustering focus point p _x，y Adjacent pixel point P of (a) _x′，y′ If pixel point P _x′，y′ The clustering of the matrix satisfies v _j ＝{P _x，y |ins∈P _x，y Ins=j } adds it to cluster v _i Otherwise, creating cluster v _j And adding a node v into the knowledge graph G _j Node v _j Category and behavior of (v) as v _j The attribute is also added into the knowledge graph G to generate a v _i And v _j Edge e between _ij ；

S134, the pixel point P _x′，y′ Instead of the original clustered focus P _x，y As a new clustering focus, repeating S133 until all the pixels belong to one cluster;

s135, taking the obtained knowledge graph G (V, E) as a target trend knowledge graph, wherein each node V has a category and a behavior trend as attributes.

Preferably, the S200 includes:

s210, establishing a panoramic segmentation model based on the remote sensing image, extracting a backbone network by taking ResNet as a characteristic, and marking the extracted characteristic map as A epsilon R ^C×H×W Wherein C is the number of channels of the feature map, and H×W is the size of the feature map;

s220, respectively carrying out channel compression on the feature map A by using 1X 1 convolution to obtain a feature map P E R ^1×H×W And Q.epsilon.R ¹ ^×H×W ；

S230, for each pixel u in the feature map P, calculating the corresponding pixel μ in the feature map Q ₁ Pixel point mu in same row and column ₂ ，μ ₃ ，...，μ _H×W-1 Is related to relation e of (2) _ui (i=1, 2,., h×w-1) and normalized using softmax to obtain the cross-attention x of pixel u _ui ：

And according to cross-attention x _ui Determining a cross-attention matrix X ε R ^{(H×W-1)×H×W} ；

S240, generating a feature map E R from the cross attention matrix X and the feature map A ^C×H×W As output of the cross-attention module:

E＝ω(X×A)+A

where ω is a learnable weight parameter.

Preferably, the S300 includes:

based on the outputs E E R of the cross-attention modules ^C×H×W The example segmentation branch utilizes region suggestion, carries out detection frame regression on a candidate region, generates the prediction probability of a target class, and generates a segmentation mask through a convolution layer in a detection frame to obtain example level segmentation of the remote sensing image;

based on the outputs E E R of the cross-attention modules ^C×H×W The semantic segmentation branch is utilizedThe convolution layer extracts semantic features in the remote sensing image, and a segmentation mask is generated for each semantic category to obtain semantic level segmentation of the remote sensing image.

Preferably, the S400 includes:

s410, recording N semantic categories as { cls ] ₁ ，cls ₂ ，...，cls _N The classification result of the instance segmentation branch and the semantic segmentation branch on a certain pixel point is Y= { Y } ₁ ，y ₂ Y is _k The posterior probabilities belonging to any category are:

wherein,is the posterior probability of the instance segmentation branch and the semantic segmentation branch for different semantic categories;

s420, for each pixel point, calculating joint probability distribution under the Bayesian theory as follows:

P(cls _i |Y)＝P(cls _i |y ₁ )P(cls _i |y ₂ )

s430, determining the pixel category according to the maximum posterior probability criterion, and obtaining the final semantic category to which the pixel belongs as follows:

Class(Y)＝argmax _i (P(cls _i |Y))，i＝1，2，...，N

wherein class (Y) is the class to which pixel Y belongs;

s440, fusion prediction conflict pixels are used for Bayesian decision, and the rest non-conflict predictions are combined, so that the panoramic segmentation image is obtained.

Preferably, the S500 includes:

and using the panoramic segmentation image as input, and generating a scene information knowledge graph which does not contain target behavior movement information according to the method of the S130.

Preferably, the S600 includes:

s610, taking the scene information knowledge graph as input, and inputting a graph attention network;

s620, for each graph node in the scene information knowledge graph, calculating the node v _i In the neighborhood N _i Inner and every other node v _j Is used as the attention coefficient e _ij The method comprises the following steps:

e _ij ＝σ(a(Wh _i ||Wh _j ))

wherein h is _i And h _j Respectively node v _i And v _j Is a weight vector and a weight matrix which can be learned, I is a splicing operation, sigma is a nonlinear function, and N is a neighborhood _i Is a first order neighbor or a second order neighbor;

s630, normalized with softmax, the attention matrix is obtained as:

s640, introducing a multi-head attention mechanism, and using K independent attention heads, obtaining attention characteristics as follows:

wherein h' _i Representing node v _i Via the output characteristics of the multi-headed attention mechanism,and W is ^k An attention matrix and a weight matrix represented on an attention header k;

s650, extracting characteristic information of the scene information knowledge graph by using the graph attention network, and reasoning behavior movement information of the attention target.

According to the embodiment of the invention, the target motion knowledge graph is quickly generated by utilizing the panoramic segmentation image of the high-resolution remote sensing image by designing the knowledge graph generation method based on pixel clustering; by introducing the cross attention module, the long-distance context information extraction capability of the feature extraction network is enhanced; respectively obtaining instance-level segmentation and semantic-level segmentation in the high-resolution remote sensing image by constructing a panoramic segmentation model; the problem of conflict when two branch results are fused is solved by introducing a branch fusion module based on Bayesian decision, and a high-precision panoramic segmentation result is obtained, so that the aim of scene segmentation is fulfilled; by constructing a graph attention network, the explicit and potential association relationship between targets in a focus attention scene and between targets and environments is realized, and behavior trend reasoning of the focus attention target is realized; according to the remote sensing image target movement direction judging method provided by the embodiment of the invention, the scene environment of the cognition important attention area can be rapidly understood, the movement direction of the attention target can be intelligently inferred, and the target movement direction judgment of the high-resolution remote sensing image with high timeliness, high precision and high intellectualization can be realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a remote sensing image target motion direction judging method according to an embodiment of the invention;

fig. 2 is a schematic diagram of a remote sensing image target motion direction discrimination method according to an embodiment of the present invention;

fig. 3 (1) - (3) are schematic diagrams of links of the remote sensing image target motion direction determining method in the data generating stage according to the embodiment of the present invention;

FIG. 4 is a schematic sub-flowchart of a remote sensing image target motion direction determination method according to an embodiment of the present invention;

FIG. 5 is a schematic view of a panoramic segmentation model according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a cross-attention module according to an embodiment of the present invention.

Detailed Description

The description of the embodiments of this specification should be taken in conjunction with the accompanying drawings, which are a complete description of the embodiments. In the drawings, the shape or thickness of the embodiments may be enlarged and indicated simply or conveniently. Furthermore, portions of the structures in the drawings will be described in terms of separate descriptions, and it should be noted that elements not shown or described in the drawings are in a form known to those of ordinary skill in the art.

Any references to directions and orientations in the description of the embodiments herein are for convenience only and should not be construed as limiting the scope of the invention in any way. The following description of the preferred embodiments will refer to combinations of features, which may be present alone or in combination, and the invention is not particularly limited to the preferred embodiments. The scope of the invention is defined by the claims.

As shown in fig. 1 and fig. 2, the remote sensing image target trend discrimination method in the embodiment of the invention uses a knowledge graph generation method based on pixel clustering as a core, organically combines high-resolution remote sensing image panoramic segmentation with graph annotation meaning network reasoning, and can be divided into three stages of data generation, panoramic segmentation and knowledge reasoning. The data generation stage comprises a step S100, the panoramic segmentation stage comprises steps S200-S500, and the knowledge reasoning stage comprises a step S600.

As shown in fig. 2 and 3, in the data generation stage, the high-resolution remote sensing image panorama segmentation labels are taken as input, which include both pixel-level classification of the image and instance-level segmentation of the "foreground" object of interest. For the concerned target in the panoramic segmented image, additional target behavior movement direction labeling is needed, and then a target movement direction knowledge graph is generated by using a knowledge graph generation method based on pixel clustering. Fig. 3 is an illustration of an airport aircraft target for ease of understanding, but is not intended to limit the present embodiment.

As shown in fig. 1, the data generation phase includes:

and S100, carrying out panoramic segmentation labeling and target behavior movement direction labeling on the remote sensing image, and determining a target movement direction knowledge graph according to a labeling result.

As shown in fig. 4, in the present embodiment, S100 specifically includes:

s110, performing instance-level labeling on the foreground target and performing semantic-level labeling on the background target.

The foreground objects are subjected to instance-level labeling, including pixel-level classification according to semantic categories, and instances are divided, so that different foreground objects in the same semantic category have different instance numbers, and typical foreground objects include aircrafts, vehicles or ships and the like.

Semantic-level labeling of background objects, including pixel-level classification of different background objects according to semantic categories, background objects are often not counted, typical background objects are tarmac, runway or vegetation, etc.

S120, behavior movement information is additionally marked on the concerned target.

Taking the example that the target of interest is an aircraft, the behavior movement information may include movement information of behaviors such as take-off, landing, taxiing, and berthing, and the behavior movement information of other targets (i.e., targets not of great interest) and the background defaults to null.

In this embodiment, a brand new knowledge graph generation method based on pixel clustering is introduced, which specifically includes:

recording the number of semantic categories in the panoramic segmented image as N, the number of instance objects as M (the same background category belongs to the same instance object), and each pixel point P _x，y Represented by quadruples (x, y, cls, ins), where x and y represent pixel point P _x，y Cls=1, 2,..n represents pixel point P _x，y Is=1, 2,..m represents pixel point P _x，y Is then:

s132, selecting any pixel point P _x，y For the current cluster focus, a cluster v is created _i ＝{P _x，y |insP _x，y Ins=i }, and v is added to the knowledge graph G _i Will cluster the focus P _x，y Category and behavior of (v) as v _i The attributes of (2) are also added into the knowledge graph G;

s133, traversing clustering attention point P _X，y Adjacent pixel point P of (a) _x′，y′ If pixel point P _x′，y′ The clustering of the matrix satisfies v _j ＝{P _x，y |ins∈P _x，y Ins=j } adds it to cluster v _i Otherwise, creating cluster v _j And adding a node v into the knowledge graph G _j Node v _j Category and behavior of (v) as v _j The attribute is also added into the knowledge graph G to generate a v _i And v _j Edge e between _ij ；

As shown in fig. 5, in the panorama segmentation stage, a panorama segmentation model needs to be constructed, including a feature extraction network including a cross-attention module, an instance segmentation branch, a semantic segmentation branch, and a bayesian decision-based result fusion module.

As shown in fig. 1, in the present embodiment, the panorama segmentation stage includes:

s200, a panoramic segmentation model is established based on the remote sensing image, a ResNet is used as a feature to extract a backbone network, and a cross attention module is introduced to extract long-distance context information.

As shown in fig. 6, in this embodiment, specifically, it includes:

s210, establishing a panoramic segmentation model based on the remote sensing image, and extracting a backbone network by taking ResNet as a characteristicThe obtained characteristic graph is marked as A epsilon R ^C×H×W Wherein C is the number of channels of the feature map, and H×W is the feature map size;

E＝ω(X×A)+A

where ω is a learnable weight parameter.

S300, performing instance-level segmentation on the remote sensing image according to instance segmentation branches in the panoramic segmentation network, and performing semantic-level segmentation on the remote sensing image according to semantic segmentation branches in the panoramic segmentation network.

In this embodiment, the method specifically includes:

based on the outputs E E R of the cross-attention modules ^C×H×W The example segmentation branch utilizes the regional suggestion to carry out detection frame regression on the candidate region, generates the prediction probability of the target category, and generates a segmentation mask through a convolution layer in the detection frame to obtain the example level segmentation of the remote sensing image;

based on the outputs E E R of the cross-attention modules ^C×H×W Semantic segmentation branches extract semantic features in remote sensing images by using convolution layers, and the semantic features are classified for each semantic typeAnd generating a segmentation mask respectively to obtain semantic level segmentation of the remote sensing image.

S400, introducing a Bayesian decision-based branch fusion module, and carrying out decision fusion on the results of the example segmentation branches and the semantic segmentation branches to generate a panoramic segmentation image.

The bayesian decision fusion is directed to the portion where the example segmentation branch and the semantic segmentation branch result collide, and in this embodiment, the bayesian decision fusion specifically includes:

P(cls _i |Y)＝P(cls _i |y ₁ )P(cls _i |y ₂ )

Class(Y)＝argmax _i (P(cls _i |Y))，i＝1，2，...，N

wherein class (Y) is the class to which pixel Y belongs;

S500, carrying out pixel clustering on the panoramic segmentation image to generate a scene information knowledge graph.

In this embodiment, the input of this step is a panoramic segmented image of the high-resolution remote sensing image generated through bayesian decision fusion, and according to the same method as step S130, a scene information knowledge map that does not include the target behavior movement information is generated.

As shown in fig. 1, in the present embodiment, the knowledge reasoning stage includes:

In this embodiment, the method specifically includes:

e _ij ＝σ(a(Wh _i ||Wh _i ))

s630, to increase the comparability of the attention values among different nodes, normalization is performed using softmax, and the attention matrix is obtained as follows:

s640, in order to obtain a more stable learning effect, a multi-head attention mechanism is introduced, and K independent attention heads are used to obtain attention characteristics as follows:

In summary, according to the remote sensing image target trend discrimination method, a knowledge graph generation method based on pixel clustering is designed, and a target trend knowledge graph is rapidly generated by utilizing a panoramic segmentation image of a high-resolution remote sensing image; by introducing the cross attention module, the long-distance context information extraction capability of the feature extraction network is enhanced; respectively obtaining instance-level segmentation and semantic-level segmentation in the high-resolution remote sensing image by constructing a panoramic segmentation model; the problem of conflict when two branch results are fused is solved by introducing a branch fusion module based on Bayesian decision, and a high-precision panoramic segmentation result is obtained, so that the aim of scene segmentation is fulfilled; by constructing a graph attention network, the explicit and potential association relationship between targets in a focus attention scene and between targets and environments is realized, and behavior trend reasoning of the focus attention target is realized; according to the remote sensing image target movement direction judging method provided by the embodiment of the invention, the scene environment of the cognition important attention area can be rapidly understood, the movement direction of the attention target can be intelligently inferred, and the target movement direction judging of the high-resolution remote sensing image with high timeliness, high precision and high intellectualization can be realized

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. The remote sensing image target movement direction distinguishing method is characterized by comprising the following steps of:

s600, according to the graph attention network, judging the direction of the attention target in the scene information knowledge graph;

wherein, the S100 includes:

2. The method according to claim 1, wherein in S110:

the S130 includes:

s132, selecting any pixel point P _x，y For the current cluster focus, a cluster v is created _i ＝{P _x，y |ins∈P _x，y Ins=i }, and adding λv to the knowledge graph G _i Will cluster the focus P _x，y Category and behavior of (v) as v _i The attributes of (2) are also added into the knowledge graph G;

3. The method according to claim 2, wherein S200 includes:

s220, respectively carrying out channel compression on the feature map A by using 1X 1 convolution to obtain a feature map P E R ^1×H×W And Q.epsilon.R ^1×H×W ；

E＝ω(X×A)+A

where ω is a learnable weight parameter.

4. The method according to claim 3, wherein the step S300 includes:

based on the outputs E E R of the cross-attention modules ^C×H×W The semantic division branching utilizes volumesAnd extracting semantic features in the remote sensing image by lamination, and generating a segmentation mask for each semantic category to obtain semantic level segmentation of the remote sensing image.

5. The method of claim 4, wherein S400 includes:

P(cls _i |lY)＝P(cls _i |y ₁ )P(cls _i |y ₂ )

Class(Y)＝argmax _i (P(cls _i |Y))，i＝1，2，...，N

wherein Class (Y) is the Class to which pixel Y belongs;

6. The method of claim 5, wherein S500 includes:

7. The method according to claim 5, wherein the step S600 includes:

e _ij ＝σ(a(Wh _i ||Wh _j ))

s630, normalized with softmax, the attention matrix is obtained as:

wherein h' _i Representing node v _i Via the output characteristics of the multi-headed attention mechanism,and W is ^k Representation ofAn attention matrix and a weight matrix on the attention header k;