CN114359627A

CN114359627A - Target detection post-processing method and device based on graph convolution

Info

Publication number: CN114359627A
Application number: CN202111536248.6A
Authority: CN
Inventors: 李军
Original assignee: Nanjing Inspector Intelligent Technology Co ltd
Current assignee: Nanjing Inspector Intelligent Technology Co ltd
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-04-15
Anticipated expiration: 2041-12-15
Also published as: CN114359627B

Abstract

The invention discloses a target detection post-processing method and a target detection post-processing device based on graph convolution, wherein the method specifically comprises the following steps: step 1, training stage: training to obtain a graph convolution neural network model; (1) screening out the best matching prediction box; (2) predicting a best matching set of rectangular boxes using graph convolution; step 2, a prediction stage: and for each detected picture, recording a rectangular frame set which is predicted by the target detection model and is subjected to category score threshold filtering as B, constructing the B as a graph, using a trained graph convolutional neural network model, and if the score of the prediction category of 1 is greater than a preset threshold, reserving the rectangular frame corresponding to the node, wherein the reserved rectangular frame set is the final output result. By using graph convolution operation instead of operation, a preset threshold is not needed, and the post-processing performance of the target detection model can be greatly improved by using the characteristic information of the rectangular frame and the context information of the rectangular frame.

Description

Target detection post-processing method and device based on graph convolution

Technical Field

The invention relates to the field of image recognition research, in particular to the field of target detection and the field of deep learning, and particularly relates to a target detection post-processing method and device based on graph convolution.

Background

The post-processing stage of the target detection model comprises two steps, firstly, filtering out the prediction result with the category score lower than the threshold value, and then, using operation to filter overlapped rectangular frames to obtain the final target detection prediction result. Because the operation only uses the position information of the rectangular frame, the operation is easily influenced by the preset threshold value, if the threshold value is too large, a plurality of rectangular frames are easily output on the same target, the accuracy rate is reduced, if the threshold value is too small, two adjacent targets are easily output only one rectangular frame, and the recall rate is reduced.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a target detection post-processing method and device based on graph convolution, which can greatly improve the post-processing performance of a target detection model. The technical scheme is as follows:

the invention provides a target detection post-processing method based on graph convolution, which specifically comprises the following steps:

step 1, training stage: training to obtain a graph convolution neural network model;

(1) screening out the best matching prediction box;

for each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B₁,b₂,…,b_nAnd recording a set of rectangular frames of all real targets in the picture as G-G₁,g₂,…,g_m}。

Constructing a weighted bipartite graph using a set B and a set G, the vertex B in the set B_iAnd the weight of the connecting line between the vertices gi in the set G is defined as: b_iAnd g_jIoU (b) in between_i,g_j) The value, i ∈ {1,2, … … n }, j ∈ {1,2, … … m }.

And solving the best matching by using a KM algorithm so that the weight value of the matching result is the maximum, wherein in the best matching result, the element set belonging to the set B is marked as B '═ B'₁,b'₂,…,b'_r}，r≤m。

(2) Predicting a best matching rectangular box set B' by using graph convolution;

for each training picture, predicting the target detection modelAnd the set of rectangular frames after being filtered by the category score threshold is marked as B ═ B₁,b₂,…,b_nSet B itself can also be constructed as a graph P with a set of nodes denoted V ═ V₁,v₂,…,v_nThe number of the elements of the set V is consistent with that of the elements of the set B, and the number of the elements of the set V is any one_iThe initial characteristic vector of the target detection model is used as a prediction rectangular frame b of the target detection model_iThe corresponding feature vector of (2). All nodes in the graph P form a feature matrix H, and H belongs to R^n×pAnd n is the number of the nodes, and p is the dimension of the feature vector of the target detection model for predicting the position of the rectangular frame.

The nodes in the graph P are connected to each other two by two, and the set E ═ E of the edges that constitute the graph P₁,e₂,…,e_kWhere k is n²(ii) a The adjacency matrix A, A ∈ R describing the graph P^n×nAdjacent to elements in matrix A

Representative node

And node

The edge between is a rectangular frame

And a rectangular frame

IoU value in between:

defining a multilayer graph convolution neural network, wherein the number of layers of the network is marked as L, and the operation of convolution of each layer of graph is marked as H^l+1＝σ(AH^lW^l)，H^lIs a feature matrix of the l-th layer, W^lIs the weight of the l-th layerThe matrix, σ, is the activation function,

after the graph P has undergone the multilayer graph convolution operation, a graph P ' results, for each node v ' in the graph P '_iIf it corresponds to the rectangular frame b_iE, B', the category of which is set to be 1, indicates that the rectangular box corresponding to the node needs to be reserved, otherwise, the category of which is set to be 0, indicates that the rectangular box corresponding to the node does not need to be reserved; and (3) calculating cross-entropy loss of each node in the graph G' by using a softmax function, and training the graph convolution neural network model by using an optimization function until the model converges to obtain the trained graph convolution neural network model.

Step 2, prediction phase

For each detected picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B₁,b₂,…,b_nB is constructed as a graph, which is marked as a graph P₁The construction mode is the same as that in the step 1 (2); using a trained graph convolution neural network model, for graph P₁Forward derivation is performed to obtain a map P'₁(ii) a For graph G'₁Any node v 'of'_iIf the score with the prediction category of 1 is larger than the preset threshold value, the rectangular frame b corresponding to the node is reserved_iAll the reserved rectangular box sets are the final output results.

Preferably, the activation function is a ReLU function.

Preferably, the optimization function uses an SGD optimization function or an Adam optimization function.

Compared with the prior art, one of the technical schemes has the following beneficial effects: by using graph convolution operation instead of operation, a preset threshold is not needed, and the post-processing performance of the target detection model can be greatly improved by using the characteristic information of the rectangular frame and the context information of the rectangular frame.

Detailed Description

In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail below. All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

The terms "step 1," "step 2," "step 3," and the like in the description and claims of this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be practiced in sequences other than those described herein.

In a first aspect: the embodiment of the disclosure provides a target detection post-processing method based on graph convolution, which is characterized in that based on a target detection model of a deep convolution neural network, a picture is sequentially processed by a backbone module, a tack module and a head module, the type and the position predicted by the model are output, the position is described in a rectangular frame mode, and a final position result is obtained by processing an output rectangular frame set by graph convolution;

the method specifically comprises the following steps:

(1) screening out the best matching prediction box;

Constructing a weighted bipartite graph using a set B and a set G, the vertex B in the set B_iAnd vertex G in set G_iThe weight of the inter-connecting line is defined as: b_iAnd g_jThe IoU value in between, i ∈ {1,2, … … n }, j ∈ {1,2, … … m }.

The KM algorithm (Kuhn-Munkres algorithm) is used to solve the best match so that the match is madeThe weight value of the outcome is the largest. In the best matching result, the element set belonging to the set B is denoted as B '═ B'₁,b'₂,…,b'_rR is less than or equal to m, and the set B' solved by the KM algorithm is the optimal result of the nms post-processing algorithm.

for each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B₁,b₂,…,b_nSet B itself can also be constructed as a graph P with a set of nodes denoted V ═ V₁,v₂,…,v_nThe number of the elements of the set V is consistent with that of the elements of the set B, and the number of the elements of the set V is any one_iThe initial characteristic vector of the target detection model is used as a prediction rectangular frame b of the target detection model_iThe corresponding feature vector of (2). All nodes in the graph P form a feature matrix H, and H belongs to R^n×pAnd n is the number of the nodes, and p is the dimension of the feature vector of the target detection model for predicting the position of the rectangular frame.

The nodes in the graph P are connected to each other two by two, and the set E ═ E of the edges that constitute the graph P₁,e₂,…,e_kWhere k is n². The adjacency matrix A, A ∈ R describing the graph P^n×nAdjacent to elements in matrix A

Representative node

And node

The edge between is a rectangular frame

And a rectangular frame

IoU value in between:

defining a multilayer graph convolution neural network, wherein the number of layers of the network is marked as L, and the operation of convolution of each layer of graph is marked as H^l+1＝σ(AH^lW^l)，H^lIs a feature matrix of the l-th layer, W^lAs a weight matrix of the l-th layer, σ is an activation function, and a ReLU function is generally used.

After the graph P is subjected to the multi-layer graph convolution operation, a graph P' is obtained. V 'for each node in graph P'_iIf it corresponds to the rectangular frame b_iE, B', the category of which is set to 1 indicates that the rectangular box corresponding to the node needs to be reserved, otherwise, the category of which is set to 0 indicates that the rectangular box corresponding to the node does not need to be reserved. And (3) calculating cross-entropy loss of each node in the graph G' by using a softmax function, and training the graph convolution neural network model by using an optimization function until the model converges to obtain the trained graph convolution neural network model. Preferably, the optimization function is an SGD optimization function or an Adam optimization function, etc., although other optimization functions are possible.

Step 2, prediction phase

For each detected picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B₁,b₂,…,b_nB is constructed as a graph, which is marked as a graph P₁The construction method is the same as that in step 1 (2). Using a trained graph convolution neural network model, for graph P₁Forward derivation is performed to obtain a map P'₁. For graph G'₁Any node v 'of'_iIf the score with the prediction category of 1 is larger than the preset threshold value, the rectangular frame b corresponding to the node is reserved_iAll the reserved rectangular box sets are the final output results.

In a second aspect, the embodiment of the present disclosure provides a target detection post-processing apparatus based on graph convolution

Based on the same technical concept, the device can realize or execute a target detection post-processing method based on graph convolution in any one of all possible implementation modes.

The target detection post-processing device based on graph convolution and the target detection post-processing method based on graph convolution provided by the embodiment belong to the same concept, and specific implementation processes are detailed in the method embodiment and are not described herein again.

The invention has been described above by way of example, it is obvious that the specific implementation of the invention is not limited by the above-described manner, and that various insubstantial modifications are possible using the method concepts and technical solutions of the invention; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.

Claims

1. A target detection post-processing method based on graph convolution is characterized by comprising the following steps:

(1) screening out the best matching prediction box;

for each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B₁,b₂,…,b_nAnd recording a set of rectangular frames of all real targets in the picture as G-G₁,g₂,…,g_m}；

Constructing a weighted bipartite graph using a set B and a set G, the vertex B in the set B_iAnd vertex G in set G_iThe weight of the inter-connecting line is defined as: b_iAnd g_jIoU (b) in between_i,g_j) The value, i ∈ {1,2, … … n }, j ∈ {1,2, … … m };

solving for best match using KM algorithmThe weight value of the matching result is maximized, and the element set belonging to the set B is denoted as B ' ═ B ' in the best matching result '₁,b'₂,…,b'_r}，r≤m；

for each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B₁,b₂,…,b_nSet B itself can also be constructed as a graph P with a set of nodes denoted V ═ V₁,v₂,…,v_nThe number of the elements of the set V is consistent with that of the elements of the set B, and the number of the elements of the set V is any one_iThe initial characteristic vector of the target detection model is used as a prediction rectangular frame b of the target detection model_iThe corresponding feature vector of (2); all nodes in the graph P form a feature matrix H, and H belongs to R^n×pWherein n is the number of nodes, and p is the dimension of the feature vector of the target detection model prediction rectangular frame position;

Representative node

And node

The edge between is a rectangular frame

And a rectangular frame

IoU value in between:

defining a multilayer graph convolution neural network, wherein the number of layers of the network is marked as L, and the operation of convolution of each layer of graph is marked as H^l+1＝σ(AH^lW^l)，H^lIs a feature matrix of the l-th layer, W^lIs the weight matrix of the l-th layer, σ is the activation function,

after the graph P has undergone the multilayer graph convolution operation, a graph P ' results, for each node v ' in the graph P '_iIf it corresponds to the rectangular frame b_iE, B', the category of which is set to be 1, indicates that the rectangular box corresponding to the node needs to be reserved, otherwise, the category of which is set to be 0, indicates that the rectangular box corresponding to the node does not need to be reserved; calculating cross-entropy loss of each node in the graph G' by using a softmax function, training a graph convolution neural network model by using an optimization function until the model converges, and obtaining a trained graph convolution neural network model;

step 2, prediction phase

2. The method of claim 1, wherein the activation function is a ReLU function.

3. The graph convolution-based target detection post-processing method according to claim 2, wherein the optimization function uses an SGD optimization function or an Adam optimization function.

4. A graph convolution-based object detection post-processing device, which is characterized by being capable of implementing or executing a graph convolution-based object detection post-processing method according to any one of claims 1 to 3.