CN114359627A - Target detection post-processing method and device based on graph convolution - Google Patents

Target detection post-processing method and device based on graph convolution Download PDF

Info

Publication number
CN114359627A
CN114359627A CN202111536248.6A CN202111536248A CN114359627A CN 114359627 A CN114359627 A CN 114359627A CN 202111536248 A CN202111536248 A CN 202111536248A CN 114359627 A CN114359627 A CN 114359627A
Authority
CN
China
Prior art keywords
graph
target detection
rectangular frame
graph convolution
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111536248.6A
Other languages
Chinese (zh)
Other versions
CN114359627B (en
Inventor
李军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Inspector Intelligent Technology Co ltd
Original Assignee
Nanjing Inspector Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Inspector Intelligent Technology Co ltd filed Critical Nanjing Inspector Intelligent Technology Co ltd
Priority to CN202111536248.6A priority Critical patent/CN114359627B/en
Publication of CN114359627A publication Critical patent/CN114359627A/en
Application granted granted Critical
Publication of CN114359627B publication Critical patent/CN114359627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a target detection post-processing method and a target detection post-processing device based on graph convolution, wherein the method specifically comprises the following steps: step 1, training stage: training to obtain a graph convolution neural network model; (1) screening out the best matching prediction box; (2) predicting a best matching set of rectangular boxes using graph convolution; step 2, a prediction stage: and for each detected picture, recording a rectangular frame set which is predicted by the target detection model and is subjected to category score threshold filtering as B, constructing the B as a graph, using a trained graph convolutional neural network model, and if the score of the prediction category of 1 is greater than a preset threshold, reserving the rectangular frame corresponding to the node, wherein the reserved rectangular frame set is the final output result. By using graph convolution operation instead of operation, a preset threshold is not needed, and the post-processing performance of the target detection model can be greatly improved by using the characteristic information of the rectangular frame and the context information of the rectangular frame.

Description

Target detection post-processing method and device based on graph convolution
Technical Field
The invention relates to the field of image recognition research, in particular to the field of target detection and the field of deep learning, and particularly relates to a target detection post-processing method and device based on graph convolution.
Background
The post-processing stage of the target detection model comprises two steps, firstly, filtering out the prediction result with the category score lower than the threshold value, and then, using operation to filter overlapped rectangular frames to obtain the final target detection prediction result. Because the operation only uses the position information of the rectangular frame, the operation is easily influenced by the preset threshold value, if the threshold value is too large, a plurality of rectangular frames are easily output on the same target, the accuracy rate is reduced, if the threshold value is too small, two adjacent targets are easily output only one rectangular frame, and the recall rate is reduced.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a target detection post-processing method and device based on graph convolution, which can greatly improve the post-processing performance of a target detection model. The technical scheme is as follows:
the invention provides a target detection post-processing method based on graph convolution, which specifically comprises the following steps:
step 1, training stage: training to obtain a graph convolution neural network model;
(1) screening out the best matching prediction box;
for each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnAnd recording a set of rectangular frames of all real targets in the picture as G-G1,g2,…,gm}。
Constructing a weighted bipartite graph using a set B and a set G, the vertex B in the set BiAnd the weight of the connecting line between the vertices gi in the set G is defined as: biAnd gjIoU (b) in betweeni,gj) The value, i ∈ {1,2, … … n }, j ∈ {1,2, … … m }.
Figure BDA0003412649020000011
And solving the best matching by using a KM algorithm so that the weight value of the matching result is the maximum, wherein in the best matching result, the element set belonging to the set B is marked as B '═ B'1,b'2,…,b'r},r≤m。
(2) Predicting a best matching rectangular box set B' by using graph convolution;
for each training picture, predicting the target detection modelAnd the set of rectangular frames after being filtered by the category score threshold is marked as B ═ B1,b2,…,bnSet B itself can also be constructed as a graph P with a set of nodes denoted V ═ V1,v2,…,vnThe number of the elements of the set V is consistent with that of the elements of the set B, and the number of the elements of the set V is any oneiThe initial characteristic vector of the target detection model is used as a prediction rectangular frame b of the target detection modeliThe corresponding feature vector of (2). All nodes in the graph P form a feature matrix H, and H belongs to Rn×pAnd n is the number of the nodes, and p is the dimension of the feature vector of the target detection model for predicting the position of the rectangular frame.
The nodes in the graph P are connected to each other two by two, and the set E ═ E of the edges that constitute the graph P1,e2,…,ekWhere k is n2(ii) a The adjacency matrix A, A ∈ R describing the graph Pn×nAdjacent to elements in matrix A
Figure BDA0003412649020000022
Representative node
Figure BDA0003412649020000023
And node
Figure BDA0003412649020000024
The edge between is a rectangular frame
Figure BDA0003412649020000025
And a rectangular frame
Figure BDA0003412649020000026
IoU value in between:
Figure BDA0003412649020000021
defining a multilayer graph convolution neural network, wherein the number of layers of the network is marked as L, and the operation of convolution of each layer of graph is marked as Hl+1=σ(AHlWl),HlIs a feature matrix of the l-th layer, WlIs the weight of the l-th layerThe matrix, σ, is the activation function,
after the graph P has undergone the multilayer graph convolution operation, a graph P ' results, for each node v ' in the graph P 'iIf it corresponds to the rectangular frame biE, B', the category of which is set to be 1, indicates that the rectangular box corresponding to the node needs to be reserved, otherwise, the category of which is set to be 0, indicates that the rectangular box corresponding to the node does not need to be reserved; and (3) calculating cross-entropy loss of each node in the graph G' by using a softmax function, and training the graph convolution neural network model by using an optimization function until the model converges to obtain the trained graph convolution neural network model.
Step 2, prediction phase
For each detected picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnB is constructed as a graph, which is marked as a graph P1The construction mode is the same as that in the step 1 (2); using a trained graph convolution neural network model, for graph P1Forward derivation is performed to obtain a map P'1(ii) a For graph G'1Any node v 'of'iIf the score with the prediction category of 1 is larger than the preset threshold value, the rectangular frame b corresponding to the node is reservediAll the reserved rectangular box sets are the final output results.
Preferably, the activation function is a ReLU function.
Preferably, the optimization function uses an SGD optimization function or an Adam optimization function.
Compared with the prior art, one of the technical schemes has the following beneficial effects: by using graph convolution operation instead of operation, a preset threshold is not needed, and the post-processing performance of the target detection model can be greatly improved by using the characteristic information of the rectangular frame and the context information of the rectangular frame.
Detailed Description
In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail below. All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
The terms "step 1," "step 2," "step 3," and the like in the description and claims of this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be practiced in sequences other than those described herein.
In a first aspect: the embodiment of the disclosure provides a target detection post-processing method based on graph convolution, which is characterized in that based on a target detection model of a deep convolution neural network, a picture is sequentially processed by a backbone module, a tack module and a head module, the type and the position predicted by the model are output, the position is described in a rectangular frame mode, and a final position result is obtained by processing an output rectangular frame set by graph convolution;
the method specifically comprises the following steps:
step 1, training stage: training to obtain a graph convolution neural network model;
(1) screening out the best matching prediction box;
for each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnAnd recording a set of rectangular frames of all real targets in the picture as G-G1,g2,…,gm}。
Constructing a weighted bipartite graph using a set B and a set G, the vertex B in the set BiAnd vertex G in set GiThe weight of the inter-connecting line is defined as: biAnd gjThe IoU value in between, i ∈ {1,2, … … n }, j ∈ {1,2, … … m }.
Figure BDA0003412649020000031
The KM algorithm (Kuhn-Munkres algorithm) is used to solve the best match so that the match is madeThe weight value of the outcome is the largest. In the best matching result, the element set belonging to the set B is denoted as B '═ B'1,b'2,…,b'rR is less than or equal to m, and the set B' solved by the KM algorithm is the optimal result of the nms post-processing algorithm.
(2) Predicting a best matching rectangular box set B' by using graph convolution;
for each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnSet B itself can also be constructed as a graph P with a set of nodes denoted V ═ V1,v2,…,vnThe number of the elements of the set V is consistent with that of the elements of the set B, and the number of the elements of the set V is any oneiThe initial characteristic vector of the target detection model is used as a prediction rectangular frame b of the target detection modeliThe corresponding feature vector of (2). All nodes in the graph P form a feature matrix H, and H belongs to Rn×pAnd n is the number of the nodes, and p is the dimension of the feature vector of the target detection model for predicting the position of the rectangular frame.
The nodes in the graph P are connected to each other two by two, and the set E ═ E of the edges that constitute the graph P1,e2,…,ekWhere k is n2. The adjacency matrix A, A ∈ R describing the graph Pn×nAdjacent to elements in matrix A
Figure BDA0003412649020000042
Representative node
Figure BDA0003412649020000043
And node
Figure BDA0003412649020000044
The edge between is a rectangular frame
Figure BDA0003412649020000045
And a rectangular frame
Figure BDA0003412649020000046
IoU value in between:
Figure BDA0003412649020000041
defining a multilayer graph convolution neural network, wherein the number of layers of the network is marked as L, and the operation of convolution of each layer of graph is marked as Hl+1=σ(AHlWl),HlIs a feature matrix of the l-th layer, WlAs a weight matrix of the l-th layer, σ is an activation function, and a ReLU function is generally used.
After the graph P is subjected to the multi-layer graph convolution operation, a graph P' is obtained. V 'for each node in graph P'iIf it corresponds to the rectangular frame biE, B', the category of which is set to 1 indicates that the rectangular box corresponding to the node needs to be reserved, otherwise, the category of which is set to 0 indicates that the rectangular box corresponding to the node does not need to be reserved. And (3) calculating cross-entropy loss of each node in the graph G' by using a softmax function, and training the graph convolution neural network model by using an optimization function until the model converges to obtain the trained graph convolution neural network model. Preferably, the optimization function is an SGD optimization function or an Adam optimization function, etc., although other optimization functions are possible.
Step 2, prediction phase
For each detected picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnB is constructed as a graph, which is marked as a graph P1The construction method is the same as that in step 1 (2). Using a trained graph convolution neural network model, for graph P1Forward derivation is performed to obtain a map P'1. For graph G'1Any node v 'of'iIf the score with the prediction category of 1 is larger than the preset threshold value, the rectangular frame b corresponding to the node is reservediAll the reserved rectangular box sets are the final output results.
In a second aspect, the embodiment of the present disclosure provides a target detection post-processing apparatus based on graph convolution
Based on the same technical concept, the device can realize or execute a target detection post-processing method based on graph convolution in any one of all possible implementation modes.
The target detection post-processing device based on graph convolution and the target detection post-processing method based on graph convolution provided by the embodiment belong to the same concept, and specific implementation processes are detailed in the method embodiment and are not described herein again.
The invention has been described above by way of example, it is obvious that the specific implementation of the invention is not limited by the above-described manner, and that various insubstantial modifications are possible using the method concepts and technical solutions of the invention; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.

Claims (4)

1. A target detection post-processing method based on graph convolution is characterized by comprising the following steps:
step 1, training stage: training to obtain a graph convolution neural network model;
(1) screening out the best matching prediction box;
for each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnAnd recording a set of rectangular frames of all real targets in the picture as G-G1,g2,…,gm};
Constructing a weighted bipartite graph using a set B and a set G, the vertex B in the set BiAnd vertex G in set GiThe weight of the inter-connecting line is defined as: biAnd gjIoU (b) in betweeni,gj) The value, i ∈ {1,2, … … n }, j ∈ {1,2, … … m };
Figure FDA0003412649010000011
solving for best match using KM algorithmThe weight value of the matching result is maximized, and the element set belonging to the set B is denoted as B ' ═ B ' in the best matching result '1,b'2,…,b'r},r≤m;
(2) Predicting a best matching rectangular box set B' by using graph convolution;
for each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnSet B itself can also be constructed as a graph P with a set of nodes denoted V ═ V1,v2,…,vnThe number of the elements of the set V is consistent with that of the elements of the set B, and the number of the elements of the set V is any oneiThe initial characteristic vector of the target detection model is used as a prediction rectangular frame b of the target detection modeliThe corresponding feature vector of (2); all nodes in the graph P form a feature matrix H, and H belongs to Rn×pWherein n is the number of nodes, and p is the dimension of the feature vector of the target detection model prediction rectangular frame position;
the nodes in the graph P are connected to each other two by two, and the set E ═ E of the edges that constitute the graph P1,e2,…,ekWhere k is n2(ii) a The adjacency matrix A, A ∈ R describing the graph Pn×nAdjacent to elements in matrix A
Figure FDA0003412649010000012
Representative node
Figure FDA0003412649010000013
And node
Figure FDA0003412649010000014
The edge between is a rectangular frame
Figure FDA0003412649010000015
And a rectangular frame
Figure FDA0003412649010000016
IoU value in between:
Figure FDA0003412649010000017
defining a multilayer graph convolution neural network, wherein the number of layers of the network is marked as L, and the operation of convolution of each layer of graph is marked as Hl+1=σ(AHlWl),HlIs a feature matrix of the l-th layer, WlIs the weight matrix of the l-th layer, σ is the activation function,
after the graph P has undergone the multilayer graph convolution operation, a graph P ' results, for each node v ' in the graph P 'iIf it corresponds to the rectangular frame biE, B', the category of which is set to be 1, indicates that the rectangular box corresponding to the node needs to be reserved, otherwise, the category of which is set to be 0, indicates that the rectangular box corresponding to the node does not need to be reserved; calculating cross-entropy loss of each node in the graph G' by using a softmax function, training a graph convolution neural network model by using an optimization function until the model converges, and obtaining a trained graph convolution neural network model;
step 2, prediction phase
For each detected picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnB is constructed as a graph, which is marked as a graph P1The construction mode is the same as that in the step 1 (2); using a trained graph convolution neural network model, for graph P1Forward derivation is performed to obtain a map P'1(ii) a For graph G'1Any node v 'of'iIf the score with the prediction category of 1 is larger than the preset threshold value, the rectangular frame b corresponding to the node is reservediAll the reserved rectangular box sets are the final output results.
2. The method of claim 1, wherein the activation function is a ReLU function.
3. The graph convolution-based target detection post-processing method according to claim 2, wherein the optimization function uses an SGD optimization function or an Adam optimization function.
4. A graph convolution-based object detection post-processing device, which is characterized by being capable of implementing or executing a graph convolution-based object detection post-processing method according to any one of claims 1 to 3.
CN202111536248.6A 2021-12-15 2021-12-15 Target detection post-processing method and device based on graph convolution Active CN114359627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111536248.6A CN114359627B (en) 2021-12-15 2021-12-15 Target detection post-processing method and device based on graph convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111536248.6A CN114359627B (en) 2021-12-15 2021-12-15 Target detection post-processing method and device based on graph convolution

Publications (2)

Publication Number Publication Date
CN114359627A true CN114359627A (en) 2022-04-15
CN114359627B CN114359627B (en) 2024-06-07

Family

ID=81099177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111536248.6A Active CN114359627B (en) 2021-12-15 2021-12-15 Target detection post-processing method and device based on graph convolution

Country Status (1)

Country Link
CN (1) CN114359627B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126472A (en) * 2019-12-18 2020-05-08 南京信息工程大学 Improved target detection method based on SSD
CN111612051A (en) * 2020-04-30 2020-09-01 杭州电子科技大学 Weak supervision target detection method based on graph convolution neural network
CN112733680A (en) * 2020-12-31 2021-04-30 南京视察者智能科技有限公司 Model training method, extracting method and device for generating high-quality face image based on monitoring video stream and terminal equipment
CN112884064A (en) * 2021-03-12 2021-06-01 迪比(重庆)智能科技研究院有限公司 Target detection and identification method based on neural network
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
CN113255615A (en) * 2021-07-06 2021-08-13 南京视察者智能科技有限公司 Pedestrian retrieval method and device for self-supervision learning
WO2021227091A1 (en) * 2020-05-15 2021-11-18 南京智谷人工智能研究院有限公司 Multi-modal classification method based on graph convolutional neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126472A (en) * 2019-12-18 2020-05-08 南京信息工程大学 Improved target detection method based on SSD
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
CN111612051A (en) * 2020-04-30 2020-09-01 杭州电子科技大学 Weak supervision target detection method based on graph convolution neural network
WO2021227091A1 (en) * 2020-05-15 2021-11-18 南京智谷人工智能研究院有限公司 Multi-modal classification method based on graph convolutional neural network
CN112733680A (en) * 2020-12-31 2021-04-30 南京视察者智能科技有限公司 Model training method, extracting method and device for generating high-quality face image based on monitoring video stream and terminal equipment
CN112884064A (en) * 2021-03-12 2021-06-01 迪比(重庆)智能科技研究院有限公司 Target detection and identification method based on neural network
CN113255615A (en) * 2021-07-06 2021-08-13 南京视察者智能科技有限公司 Pedestrian retrieval method and device for self-supervision learning

Also Published As

Publication number Publication date
CN114359627B (en) 2024-06-07

Similar Documents

Publication Publication Date Title
CN109215034B (en) Weak supervision image semantic segmentation method based on spatial pyramid covering pooling
CN108038846A (en) Transmission line equipment image defect detection method and system based on multilayer convolutional neural networks
CN107133943A (en) A kind of visible detection method of stockbridge damper defects detection
CN108334881B (en) License plate recognition method based on deep learning
CN111738315A (en) Image classification method based on countermeasure fusion multi-source transfer learning
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
CN106407986A (en) Synthetic aperture radar image target identification method based on depth model
CN109034184B (en) Grading ring detection and identification method based on deep learning
CN101169867B (en) Image dividing method, image processing apparatus and system
CN111931686B (en) Video satellite target tracking method based on background knowledge enhancement
CN113657560A (en) Weak supervision image semantic segmentation method and system based on node classification
CN109685772B (en) No-reference stereo image quality evaluation method based on registration distortion representation
WO2024055948A1 (en) Improved unsupervised remote-sensing image abnormality detection method
CN113128308B (en) Pedestrian detection method, device, equipment and medium in port scene
CN109444604A (en) A kind of DC/DC converter method for diagnosing faults based on convolutional neural networks
CN114511627A (en) Target fruit positioning and dividing method and system
CN111563542A (en) Automatic plant classification method based on convolutional neural network
CN113920062A (en) Infrared thermal imaging power equipment fault detection method
CN113627481A (en) Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens
CN113989296A (en) Unmanned aerial vehicle wheat field remote sensing image segmentation method based on improved U-net network
CN114359627A (en) Target detection post-processing method and device based on graph convolution
CN110827319A (en) Improved Staple target tracking method based on local sensitive histogram
CN117392065A (en) Cloud edge cooperative solar panel ash covering condition autonomous assessment method
CN115442192B (en) Communication signal automatic modulation recognition method and device based on active learning
CN111666872A (en) Efficient behavior identification method under data imbalance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant