CN114359627B

CN114359627B - Target detection post-processing method and device based on graph convolution

Info

Publication number: CN114359627B
Application number: CN202111536248.6A
Authority: CN
Inventors: 李军
Original assignee: Nanjing Inspector Intelligent Technology Co ltd
Current assignee: Nanjing Inspector Intelligent Technology Co ltd
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2024-06-07
Anticipated expiration: 2041-12-15
Also published as: CN114359627A

Abstract

The invention discloses a target detection post-processing method and device based on graph convolution, wherein the method specifically comprises the following steps: step 1, training phase: training to obtain a graph convolution neural network model; (1) screening out a prediction frame with the best match; (2) Predicting a best matching set of rectangular boxes using a graph convolution; step 2, prediction stage: and for each detection picture, marking a rectangular frame set which is predicted by the target detection model and filtered by a category score threshold value as B, constructing the B as a picture, using a trained picture convolutional neural network model, if the score of the prediction category 1 is greater than a preset threshold value, reserving rectangular frames corresponding to the node, wherein all reserved rectangular frame sets are final output results. By using the graph convolution operation to replace operation, the post-processing performance of the target detection model can be improved to a great extent by utilizing the characteristic information of the rectangular frame and the context information of the rectangular frame without a preset threshold value.

Description

Target detection post-processing method and device based on graph convolution

Technical Field

The invention relates to the field of image recognition research, in particular to the field of target detection and deep learning, and in particular relates to a target detection post-processing method and device based on graph convolution.

Background

The post-processing stage of the target detection model comprises two steps, namely filtering out a predicted result with a category score lower than a threshold value, and then filtering overlapped rectangular frames by using operation to obtain a final target detection predicted result. Because the operation only uses the position information of the rectangular frames, the operation is easily influenced by a preset threshold value, if the threshold value is too large, a plurality of rectangular frames are easily output on the same target, the accuracy is reduced, and if the threshold value is too small, only one rectangular frame is easily output by two adjacent targets, and the recall rate is reduced.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a target detection post-processing method and device based on graph convolution, which can greatly improve the post-processing performance of a target detection model. The technical proposal is as follows:

The invention provides a target detection post-processing method based on graph convolution, which specifically comprises the following steps:

step 1, training phase: training to obtain a graph convolution neural network model;

(1) Screening out a prediction frame with the best match;

For each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is denoted as b= { B ₁,b₂,…,b_n }, and the set of rectangular frames of all real targets in the picture is denoted as g= { G ₁,g₂,…,g_m }.

Constructing a bipartite graph with weights by using a set B and a set G, wherein the weights of the connecting lines between the vertexes B _i in the set B and the vertexes G _i in the set G are defined as follows: ioU (b _i,g_j) values between b _i and g _j, i e {1,2,...

And solving the optimal matching by using a KM algorithm, so that the weight value of a matching result is maximum, wherein in the optimal matching result, an element set belonging to the set B is marked as B '= { B' ₁,b′₂,…,b′_r }, and r is less than or equal to m.

(2) Predicting a best matching rectangular box set B' using a convolution;

For each training picture, the rectangular frame set predicted by the target detection model and filtered by the category score threshold is denoted as b= { B ₁,b₂,…,b_n }, the set B itself can also be constructed to be a graph denoted as a graph P, the node set in the graph P is denoted as v= { V ₁,v₂,…,v_n }, the number of the set V elements is consistent with the number of the set B elements, and for any element V _i in the set V, the initial feature vector is the corresponding feature vector of the target detection model prediction rectangular frame B _i. All nodes in the graph P form a feature matrix H, H epsilon R ^n×p, wherein n is the number of the nodes, and P is the dimension of a feature vector of the position of the target detection model prediction rectangular frame.

Nodes in graph P are connected to each other two by two, forming a set e= { E ₁,e₂,…,e_k }, where k=n ²; adjacent matrix A describing graph P, A epsilon R ^n×n, elements in adjacent matrix ARepresenting nodes/>Sum node/>The value of the edge between the two is rectangular frame/>And rectangular box/>IoU values in between:

Defining a multi-layer graph convolution neural network, wherein the number of layers of the network is recorded as L, the operation of each layer of graph convolution is defined as H ^l+1＝σ(AH^lW^l),H^l as a characteristic matrix of a first layer, W ^l as a weight matrix of the first layer, sigma as an activation function,

After the graph P is subjected to the multi-layer graph convolution operation, a graph P 'is obtained, and for each node v' _i in the graph P ', if the corresponding rectangular frame B _i epsilon B' is set to be 1, the class of the corresponding rectangular frame is set to be reserved, otherwise, the class of the corresponding rectangular frame is set to be 0, and the rectangular frame corresponding to the node is not required to be reserved; and for each node in the graph P', calculating the cross-entopy loss by using a softmax function, training the graph rolling neural network model by using an optimization function until the model converges, and obtaining the trained graph rolling neural network model.

Step 2, prediction stage

For each detection picture, marking a rectangular frame set predicted by the target detection model and filtered by the category score threshold value as B= { B ₁,b₂,…,b_n }, and marking B as a picture, namely a picture P ₁, wherein the construction mode is the same as that in the step 1 (2); forward deriving a graph P ₁ by using the trained graph convolutional neural network model to obtain a graph P' ₁; for any node v '_i in the graph P' ₁, if the score of the prediction class 1 is greater than the preset threshold, reserving the rectangular frames b _i corresponding to the node, and all reserved rectangular frame sets are final output results.

Preferably, the activation function employs a ReLU function.

Preferably, the optimization function uses an SGD optimization function or an Adam optimization function.

Compared with the prior art, one of the technical schemes has the following beneficial effects: by using the graph convolution operation to replace operation, the post-processing performance of the target detection model can be improved to a great extent by utilizing the characteristic information of the rectangular frame and the context information of the rectangular frame without a preset threshold value.

Detailed Description

In order to clarify the technical scheme and working principle of the present invention, the following describes the embodiments of the present disclosure in further detail. Any combination of the above-mentioned optional solutions may be adopted to form an optional embodiment of the present disclosure, which is not described herein in detail.

The terms "step 1," "step 2," "step 3," and the like in the description and in the claims are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those described herein.

First aspect: the embodiment of the disclosure provides a target detection post-processing method based on graph convolution, which is based on a target detection model of a deep convolution neural network, sequentially processes pictures through a back box module, a back module and a head module, outputs types and positions predicted by the model, describes the positions in a rectangular frame form, and obtains a final position result by using a rectangular frame set output by graph convolution processing;

The method specifically comprises the following steps:

(1) Screening out a prediction frame with the best match;

Constructing a bipartite graph with weights by using a set B and a set G, wherein the weights of the connecting lines between the vertexes B _i in the set B and the vertexes G _i in the set G are defined as follows: ioU values between b _i and g _j, i e {1,2, once again, n, j e {1, 2.. Once again.

The best match is solved using the KM algorithm (Kuhn-Munkres algorithm) so that the weight value of the matching result is maximized. In the best matching result, the element set belonging to the set B is marked as B ' = { B ' ₁,b′₂,…,b′_r }, r is less than or equal to m, and the set B ' solved by the KM algorithm is the best result by the nms post-processing algorithm.

(2) Predicting a best matching rectangular box set B' using a convolution;

Nodes in graph P are connected to each other two by two, forming a set e= { E ₁,e₂,…,e_k }, where k=n ², of edges of graph P. Adjacent matrix A describing graph P, A epsilon R ^n×n, elements in adjacent matrix ARepresenting nodes/>Sum node/>The value of the edge between the two is rectangular frame/>And rectangular box/>IoU values in between:

A multi-layer graph convolution neural network is defined, the number of layers of the network is denoted as L, the operation of each layer of graph convolution is defined as H ^l+1＝σ(AH^lW^l),H^l being the feature matrix of the first layer, W ^l being the weight matrix of the first layer, σ being the activation function, and a ReLU function being typically used.

After the multi-layer graph convolution operation, graph P' is obtained. For each node v ' _i in the graph P ', if its corresponding rectangular box B _i e B ' is set to 1 for its class, it indicates that the rectangular box corresponding to the node needs to be reserved, otherwise, its class is set to 0 for its class, which indicates that the rectangular box corresponding to the node does not need to be reserved. And for each node in the graph P', calculating the cross-entopy loss by using a softmax function, training the graph rolling neural network model by using an optimization function until the model converges, and obtaining the trained graph rolling neural network model. Preferably, the optimization function is an SGD optimization function or Adam optimization function, etc., although other optimization functions are possible.

Step 2, prediction stage

For each detected picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is denoted as b= { B ₁,b₂,…,b_n }, B is constructed as a graph, and is denoted as a graph P ₁, and the construction manner is the same as that in step 1 (2). Forward derivation is performed on graph P ₁ using the trained graph convolutional neural network model, resulting in graph P' ₁. For any node v '_i in the graph P' ₁, if the score of the prediction class 1 is greater than the preset threshold, reserving the rectangular frames b _i corresponding to the node, and all reserved rectangular frame sets are final output results.

In a second aspect, embodiments of the present disclosure provide a target detection post-processing apparatus based on graph convolution

Based on the same technical concept, the device can realize or execute a target detection post-processing method based on graph convolution in any one of all possible implementation manners.

The target detection post-processing device based on graph convolution and the target detection post-processing method based on graph convolution provided in the foregoing embodiments belong to the same concept, and detailed implementation processes of the target detection post-processing device based on graph convolution are detailed in the method embodiments and are not described herein.

While the invention has been described above by way of example, it is evident that the invention is not limited to the particular embodiments described above, but rather, it is intended to provide various insubstantial modifications, both as to the method concepts and technical solutions of the invention; or the above conception and technical scheme of the invention are directly applied to other occasions without improvement and equivalent replacement, and all are within the protection scope of the invention.

Claims

1. The target detection post-processing method based on graph convolution is characterized by comprising the following steps of:

(1) Screening out a prediction frame with the best match;

for each training picture, marking a rectangular frame set predicted by the target detection model and filtered by the category score threshold value as B= { B ₁,b₂,…,b_n }, and marking rectangular frame sets of all real targets in the picture as G= { G ₁,g₂,…,g_m };

Constructing a bipartite graph with weights by using a set B and a set G, wherein the weights of the connecting lines between the vertexes B _i in the set B and the vertexes G _j in the set G are defined as follows: ioU (b _i,g_j) values between b _i and g _j, i e {1,2, … … n }, j e {1,2, … … m };

Solving the optimal matching by using a KM algorithm, so that the weight value of a matching result is maximum, wherein in the optimal matching result, an element set belonging to a set B is marked as B '= { B' ₁,b′₂,…,b′_r }, and r is less than or equal to m;

(2) Predicting a best matching rectangular box set B' using a convolution;

For each training picture, the rectangular frame set predicted by the target detection model and filtered by the category score threshold is marked as B= { B ₁,b₂,…,b_n }, the set B is self-structured to be marked as a graph P, the node set in the graph P is marked as V= { V ₁,v₂,…,v_n }, the number of the elements in the set V is consistent with the number of the elements in the set B, and for any element V _i in the set V, the initial feature vector is the corresponding feature vector of the target detection model prediction rectangular frame B _i; all nodes in the graph P form a feature matrix H, H epsilon R ^n×p, wherein n is the number of the nodes, and P is the dimension of a feature vector of the position of the target detection model prediction rectangular frame;

nodes in graph P are connected to each other two by two, forming a set e= { E ₁,e₂,…,e_k }, where k=n ²; adjacent matrix A describing graph P, A epsilon R ^n×n, elements in adjacent matrix A Representing nodes/>Sum node/>The value of the edge between the two is rectangular frame/>And rectangular box/>IoU values in between:

After the graph P is subjected to the multi-layer graph convolution operation, a graph P 'is obtained, and for each node v' _i in the graph P ', if the corresponding rectangular frame B _i epsilon B' is set to be 1, the class of the corresponding rectangular frame is set to be reserved, otherwise, the class of the corresponding rectangular frame is set to be 0, and the rectangular frame corresponding to the node is not required to be reserved; calculating cross-entopy loss by using a softmax function for each node in the graph P', training a graph rolling neural network model by using an optimization function until the model converges, and obtaining a trained graph rolling neural network model;

Step 2, prediction stage

2. The method of claim 1, wherein the activation function is a ReLU function.

3. The target detection post-processing method based on graph convolution according to claim 2, wherein the optimization function uses an SGD optimization function or an Adam optimization function.

4. A target detection post-processing device based on graph convolution, characterized in that the device implements or performs a target detection post-processing method based on graph convolution as claimed in any one of claims 1-3.