CN116486238A

CN116486238A - Target fine granularity identification method combining point set representation and graph classification

Info

Publication number: CN116486238A
Application number: CN202310466470.6A
Authority: CN
Inventors: 梁颖; 贺广均; 冯鹏铭; 陈千千; 上官博屹; 金世超; 常江; 田路云
Original assignee: Beijing Institute of Satellite Information Engineering
Current assignee: Beijing Institute of Satellite Information Engineering
Priority date: 2023-04-26
Filing date: 2023-04-26
Publication date: 2023-07-25
Anticipated expiration: 2043-04-26
Also published as: CN116486238B

Abstract

The invention relates to a target fine granularity identification method combining point set representation and graph classification, which comprises the following steps: constructing and training a Oriented RepPoints-based target point set representation model, and detecting and generating a point set representing a target; taking the points in the point set as nodes, and constructing a graph structure according to the spatial relationship; intercepting a rectangular area around each point, and taking the convolution characteristic corresponding to the rectangular area as the characteristic of the graph node; and constructing a graph convolution neural network model, aggregating and updating the characteristics of the graph nodes, and integrating the characteristics of all the graph nodes to classify the graphs. By implementing the scheme of the invention, the accuracy of target fine granularity identification is improved by comprehensively utilizing the component characteristics of the targets and the relation among the components.

Description

Target fine granularity identification method combining point set representation and graph classification

Technical Field

The invention relates to the technical field of target fine granularity identification of remote sensing images, in particular to a target fine granularity identification method combining point set representation and graph classification.

Background

The ship is used as a main carrier of offshore traffic and plays an important role in marine activities performed by human beings. The ship target monitoring can be applied to fishery management, offshore traffic safety and the like in the civil field, and can be applied to tasks such as information reconnaissance, sea and battlefield situation monitoring and the like in the military field.

The deep learning technology achieves excellent results in target detection and recognition tasks, and has become a mainstream method in the field of fine-granularity recognition of ship targets. The method is to respectively construct independent convolution neural network models aiming at ship detection and fine granularity recognition, so that the characteristics of a backbone network cannot be shared, and the model training and online detection efficiency is reduced. The other method is to integrate the detection and fine-granularity recognition tasks of the ship, and achieve improvement of efficiency through sharing the characteristics of a main network, for example, the fine-granularity recognition method and equipment for the ship target disclosed in Chinese patent CN115272856A are used for extracting approximate key point position information by constructing a key point coarse detection network and carrying out fine-granularity classification on the ship target by utilizing a classification sub-network based on key point attention on the basis, but the method is required to label a boundary box and key points of the ship target in a remote sensing image at the same time, and has higher labeling difficulty.

In addition, both of the above methods lack concerns about the geometric information, component characteristics and component relationships of the ship target, which are critical to fine-grained identification of the ship target.

Disclosure of Invention

In order to solve the technical problems in the prior art, the invention aims to provide a target fine granularity identification method combining point set representation and graph classification, which improves the accuracy of target fine granularity identification by comprehensively utilizing the characteristics of the target components and the relation among the components.

In order to achieve the above purpose, the technical scheme of the invention is as follows:

the embodiment of the invention provides a target fine granularity identification method combining point set representation and graph classification, which comprises the following steps:

s110, constructing and training a Oriented RepPoints-based target point set representation model, and detecting and generating a point set representing a target;

s120, taking the points in the point set as nodes, and constructing a graph structure according to the spatial relationship;

s130, intercepting a rectangular area around each point, and taking convolution characteristics corresponding to the rectangular area as characteristics of graph nodes;

and S140, constructing a graph convolution neural network model, aggregating and updating the characteristics of the graph nodes, and integrating the characteristics of all the graph nodes to classify the graphs.

According to an aspect of the embodiment of the present invention, the target point set representation model based on Oriented RepPoints constructed in S110 includes:

the main network is used for extracting multi-scale characteristics of the target;

and the positioning and classifying head is used for generating a point set representing the target on the multi-scale characteristic map and realizing the positioning and classifying of the target.

According to one aspect of an embodiment of the present invention, the backbone network employs Resnet50-FPN.

According to one aspect of an embodiment of the present invention, the locating and classifying head includes a locating branch and a classifying branch,

the positioning branch comprises two stages, wherein the first stage is used for generating a candidate point set, and the second stage is used for refining the candidate point set and generating final target positioning;

the classification branch is used for classifying targets represented by the candidate point set generated in the first stage.

According to an aspect of the embodiment of the present invention, in S110, training data of the target point set representation model based on Oriented RepPoints is a remote sensing image including a target, where the target is marked by a minimum circumscribed rectangular frame, and the marked form is a coordinate value of four corner points of the minimum circumscribed rectangular frame;

the Oriented RepPoints-based target point set representation model is trained and optimized by minimizing the first and second stage localization and classification losses.

According to an aspect of the embodiment of the present invention, the detecting in S110 generates a set of points representing the target, including:

and inputting the remote sensing image into a trained target point set representation model based on Oriented RepPoints for detection, storing point sets and corresponding bounding boxes of all targets output by the model, and simultaneously storing the characteristics of the classified branch output.

According to an aspect of the embodiment of the present invention, the S120 includes: calculating the intersection of a rectangle with the minimum edge of the boundary box and the boundary box with the point as the center, taking the intersection as a target local area corresponding to the point, obtaining a target local area corresponding to each point in the point set, and if the intersection exists between the target local areas, establishing the edge between the two nodes.

According to an aspect of the embodiment of the present invention, the S130 includes:

mapping a minimum circumscribed rectangular boundary box of a target local area corresponding to each point in the point set onto a feature map obtained by the classification branch, and extracting features corresponding to the target local area;

and generating a feature with a fixed size by rotating the feature alignment of the target local area, and stretching the feature into a one-dimensional feature vector as the feature of the graph node.

According to one aspect of an embodiment of the present invention, the graph roll-up neural network model includes three graph roll-up layers, each of which aggregates features of graph nodes in a first-order domain.

According to an aspect of the embodiment of the present invention, the step S140 of aggregating and updating the features of the graph nodes and integrating the features of all the graph nodes to classify the graph includes:

converting the characteristics of the graph nodes into higher-level characteristics by linear transformation;

calculating convolution on the graph according to the transformed features, and aggregating and updating the features of the graph nodes;

after the graph convolution layers, for each graph node, splicing the features after each graph convolution layer, and then carrying out global maximum pooling on the features of all graph nodes in the whole graph to obtain the feature vector of the whole graph;

and transmitting the feature vector of the whole graph to a multi-layer perceptron composed of a plurality of fully connected layers to obtain a graph classification result, so as to realize fine-grained identification of the target.

Compared with the prior art, the invention has the following beneficial effects:

according to the scheme of the embodiment of the invention, the component characteristics and the geometric characteristics of the ship target are extracted by utilizing the representative point set, the graph structure data representing the key points of the ship target is constructed to express the relation among the components by comprehensively utilizing the shape characteristics and the component characteristics of the ship target, and the characteristic representation capability is improved by the graph convolution neural network model and multi-characteristic fusion, so that the graph classification is realized, and the accuracy of fine-grained identification of the ship target is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 schematically illustrates a flow chart of a method for identifying target fine granularity of joint point set representation and graph classification disclosed in an embodiment of the invention;

FIG. 2 schematically illustrates a process of generating a target point set representation based on a Oriented RepPoints target point set representation model, as disclosed in an embodiment of the invention;

fig. 3 schematically illustrates a process of forming features of a graph node disclosed in an embodiment of the present invention.

Detailed Description

The description of the embodiments of this specification should be taken in conjunction with the accompanying drawings, which are a complete description of the embodiments. In the drawings, the shape or thickness of the embodiments may be enlarged and indicated simply or conveniently. Furthermore, portions of the structures in the drawings will be described in terms of separate descriptions, and it should be noted that elements not shown or described in the drawings are in a form known to those of ordinary skill in the art.

Any references to directions and orientations in the description of the embodiments herein are for convenience only and should not be construed as limiting the scope of the invention in any way. The following description of the preferred embodiments will refer to combinations of features, which may be present alone or in combination, and the invention is not particularly limited to the preferred embodiments. The scope of the invention is defined by the claims.

As shown in fig. 1, the embodiment of the invention discloses a target fine granularity identification method combining point set representation and graph classification, which is suitable for fine granularity identification of ships, airplanes and other types of targets. The target fine granularity identification method for classifying the joint point set representation and the graph comprises the following steps:

In this embodiment, the target point set representation model based on Oriented RepPoints constructed in S110 is composed of a backbone network and a positioning and classifying head. The main network is used for extracting multi-scale characteristics of the target; the positioning and classifying head is used for generating a point set representing the target on the multi-scale characteristic map and realizing the positioning and classifying of the target. By using Oriented RepPoints model to perform effective self-adaptive point learning, the target is represented more finely by the self-adaptive point set, so that the key semantic features and geometric structures of the target object can be captured. According to one embodiment of the invention, the backbone network employs Resnet50-FPN.

Specifically, the positioning and classifying head is composed of two branches, and mainly comprises a positioning branch and a classifying branch. The positioning branch comprises two stages, a first stage for generating a candidate point set and a second stage for refining the candidate point set and generating a final target positioning. The classification branch is used for classifying targets represented by the candidate point set generated in the first stage.

Exemplary, as shown in FIG. 2, the positioning branch first convolves the multi-scale feature map extracted from the backbone network three times 3×3×256 to generate feature map F _l . For the characteristic diagram F _l The first stage learns the offset OF the convolution kernel as a deformable convolution OF 3 x 3 by a convolution OF 3 x 256 and a convolution OF 1 x 18 ₁ 。OF ₁ Is 18, represents the x-direction and y-direction offsets OF the 9 sample points OF the 3 x 3 deformable convolution, OF ₁ Each point in (a) represents a set OF points, each point being at OF ₁ The position in (c) plus the offset at that position is the coordinates of each point in the point set, and a directional rectangular box can be generated from the point set by the directional conversion function. In the second stage in the feature diagram F _l The 3×3×256 deformable convolution product and the 1×1×18 convolution are performed thereon to generate a refined offset OF ₂ And a final point set representation and location of the target is obtained.

The classifying branch firstly carries out convolution of three times of 3 multiplied by 256 on the multi-scale feature map extracted by the main network to generate a feature map F _c Then in the feature map F _c Performing a 3 x 256 deformable convolution thereon, the offset OF the deformable convolution being offset from the offset OF the deformable convolution OF the second stage in the positioning branch ₂ Sharing, and finally realizing classification of the target and the background through 1 multiplied by 2 convolution.

In this embodiment, in S110, training data of the target point set representation model based on Oriented RepPoints is a remote sensing image including a target, where the target is marked by a minimum circumscribed rectangular frame, and the marked form is a coordinate value of four corner points of the minimum circumscribed rectangular frame. The training optimization of the Oriented RepPoints-based target point set representation model is accomplished by minimizing the first and second stage positioning and classification losses.

In this embodiment, the specific implementation process of generating the point set representing the target through the detection in S110 includes: inputting a remote sensing image into a trained Oriented RepPoints-based target point set representation model for detection, storing point sets R and corresponding boundary boxes B of all targets output by the model, and simultaneously storing a characteristic diagram F output by the classification branch _c 。

In this embodiment, in S120, the specific implementation process of constructing the graph structure according to the spatial relationship with the points in the point set as nodes includes: calculating an intersection C of a rectangle T with a minimum side of 1.5 times of the boundary frame by taking a point as a center and the boundary frame B, taking the intersection C as a target local area corresponding to the point, obtaining a target local area corresponding to each point in the point set, and if the intersection exists between the target local areas, establishing the side between the two nodes.

In this embodiment, as shown in fig. 3, the specific implementation process of intercepting a rectangular area around each point in S130 and using a convolution feature corresponding to the rectangular area as a feature of a graph node includes: mapping the minimum circumscribed rectangle boundary box of the target local area corresponding to each point in the point set to the feature map F obtained by the classification branch _c And extracting the corresponding characteristics of the target local area. And then generating a feature with a fixed size by rotating the feature alignment of the target local area, and stretching the feature into a one-dimensional feature vector as the feature of the graph node.

In this embodiment, the convolutional neural network model constructed in S140 includes three convolutional layers, and each convolutional layer aggregates the features of the graph nodes in the first-order domain.

In this embodiment, in S140, the aggregation and updating are performed on the features of the graph nodes by using a graph convolution neural network model, and the specific implementation process of classifying the graph by integrating the features of all the graph nodes is performed, including the following steps:

converting the features h of the graph nodes to higher-level features using a linear transformation Wh, wherein the transformation matrix W ε R ^d ^′×d Is a learnable shared parameter matrix;

the convolution on the graph is calculated for the transformed features according to the following formula, and the features of the graph node i are aggregated and updated:

wherein, is the updated feature of node i, reLU (= max (0,) is a nonlinear activation function, N _i Representing the neighborhood of node i, j representing neighborhood N _i Nodes in a, alpha _ij The number of nodes in the neighborhood, i.e., the degree of the node.

After the graph convolution layers, for each graph node, splicing the features after each graph convolution layer, and then carrying out global maximum pooling on the features of all graph nodes in one graph, and reading out the feature representation of the whole graph G, namely, carrying out representation on the following feature vectors:

wherein h is _G，k Feature vector h representing whole graph G _G The kth value, h _ig，k ' represents the feature vector h corresponding to node i _ig ' kth value;

for the feature vector h _G，k， The feature vector h _G，k And transmitting the image classification result to a multi-layer perceptron formed by a plurality of fully connected layers to obtain the image classification result, wherein the image classification result is that the image type corresponds to the fine granularity type of the ship target, so that the fine granularity identification of the target is realized.

According to the scheme, the representative point set is utilized to extract the component features and the geometric features of the ship target, the shape features and the component features of the ship target are comprehensively utilized to construct graph structure data representing key points of the ship target to express the relation among the components, and the characteristic representation capability is improved through graph convolution neural network model and multi-feature fusion, so that graph classification is realized, and the accuracy of fine-grained identification of the ship target is improved.

The sequence numbers of the steps related to the method of the present invention do not mean the sequence of the execution sequence of the method, and the execution sequence of the steps should be determined by the functions and the internal logic, and should not limit the implementation process of the embodiment of the present invention in any way.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.

Claims

1. A method of target fine granularity identification combining point set representation and graph classification, comprising:

2. The method of claim 1, wherein the Oriented RepPoints-based target point set representation model constructed in S110 comprises:

3. The method of claim 2, wherein the backbone network employs a Resnet50-FPN.

4. The method of claim 2, wherein the locating and classifying head comprises a locating branch and a classifying branch,

5. The method according to claim 4, wherein in S110, training data of the target point set representation model based on Oriented RepPoints is a remote sensing image including a target, the target is marked by a minimum circumscribed rectangular frame, and the marking is in the form of coordinate values of four corner points of the minimum circumscribed rectangular frame;

6. The method of claim 5, wherein the detecting in S110 generates a set of points representing a target, comprising:

7. The method of claim 6, wherein S120 comprises: calculating the intersection of a rectangle with the minimum edge of the boundary box and the boundary box with the point as the center, taking the intersection as a target local area corresponding to the point, obtaining a target local area corresponding to each point in the point set, and if the intersection exists between the target local areas, establishing the edge between the two nodes.

8. The method of claim 7, wherein S130 comprises:

9. The method of claim 8, wherein the convolutional neural network model comprises three convolutional layers, each convolutional layer aggregating features of graph nodes within a first-order domain.

10. The method according to claim 9, wherein the aggregating and updating the features of the graph nodes and integrating the features of all the graph nodes for graph classification in S140 includes: