CN114359627A - Target detection post-processing method and device based on graph convolution - Google Patents
Target detection post-processing method and device based on graph convolution Download PDFInfo
- Publication number
- CN114359627A CN114359627A CN202111536248.6A CN202111536248A CN114359627A CN 114359627 A CN114359627 A CN 114359627A CN 202111536248 A CN202111536248 A CN 202111536248A CN 114359627 A CN114359627 A CN 114359627A
- Authority
- CN
- China
- Prior art keywords
- graph
- target detection
- rectangular frame
- graph convolution
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 42
- 238000012805 post-processing Methods 0.000 title claims abstract description 21
- 238000000034 method Methods 0.000 title claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 17
- 238000003062 neural network model Methods 0.000 claims abstract description 13
- 238000012216 screening Methods 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000005457 optimization Methods 0.000 claims description 13
- 230000004913 activation Effects 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000009795 derivation Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 abstract description 2
- 238000013527 convolutional neural network Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 16
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a target detection post-processing method and a target detection post-processing device based on graph convolution, wherein the method specifically comprises the following steps: step 1, training stage: training to obtain a graph convolution neural network model; (1) screening out the best matching prediction box; (2) predicting a best matching set of rectangular boxes using graph convolution; step 2, a prediction stage: and for each detected picture, recording a rectangular frame set which is predicted by the target detection model and is subjected to category score threshold filtering as B, constructing the B as a graph, using a trained graph convolutional neural network model, and if the score of the prediction category of 1 is greater than a preset threshold, reserving the rectangular frame corresponding to the node, wherein the reserved rectangular frame set is the final output result. By using graph convolution operation instead of operation, a preset threshold is not needed, and the post-processing performance of the target detection model can be greatly improved by using the characteristic information of the rectangular frame and the context information of the rectangular frame.
Description
Technical Field
The invention relates to the field of image recognition research, in particular to the field of target detection and the field of deep learning, and particularly relates to a target detection post-processing method and device based on graph convolution.
Background
The post-processing stage of the target detection model comprises two steps, firstly, filtering out the prediction result with the category score lower than the threshold value, and then, using operation to filter overlapped rectangular frames to obtain the final target detection prediction result. Because the operation only uses the position information of the rectangular frame, the operation is easily influenced by the preset threshold value, if the threshold value is too large, a plurality of rectangular frames are easily output on the same target, the accuracy rate is reduced, if the threshold value is too small, two adjacent targets are easily output only one rectangular frame, and the recall rate is reduced.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a target detection post-processing method and device based on graph convolution, which can greatly improve the post-processing performance of a target detection model. The technical scheme is as follows:
the invention provides a target detection post-processing method based on graph convolution, which specifically comprises the following steps:
step 1, training stage: training to obtain a graph convolution neural network model;
(1) screening out the best matching prediction box;
for each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnAnd recording a set of rectangular frames of all real targets in the picture as G-G1,g2,…,gm}。
Constructing a weighted bipartite graph using a set B and a set G, the vertex B in the set BiAnd the weight of the connecting line between the vertices gi in the set G is defined as: biAnd gjIoU (b) in betweeni,gj) The value, i ∈ {1,2, … … n }, j ∈ {1,2, … … m }.
And solving the best matching by using a KM algorithm so that the weight value of the matching result is the maximum, wherein in the best matching result, the element set belonging to the set B is marked as B '═ B'1,b'2,…,b'r},r≤m。
(2) Predicting a best matching rectangular box set B' by using graph convolution;
for each training picture, predicting the target detection modelAnd the set of rectangular frames after being filtered by the category score threshold is marked as B ═ B1,b2,…,bnSet B itself can also be constructed as a graph P with a set of nodes denoted V ═ V1,v2,…,vnThe number of the elements of the set V is consistent with that of the elements of the set B, and the number of the elements of the set V is any oneiThe initial characteristic vector of the target detection model is used as a prediction rectangular frame b of the target detection modeliThe corresponding feature vector of (2). All nodes in the graph P form a feature matrix H, and H belongs to Rn×pAnd n is the number of the nodes, and p is the dimension of the feature vector of the target detection model for predicting the position of the rectangular frame.
The nodes in the graph P are connected to each other two by two, and the set E ═ E of the edges that constitute the graph P1,e2,…,ekWhere k is n2(ii) a The adjacency matrix A, A ∈ R describing the graph Pn×nAdjacent to elements in matrix ARepresentative nodeAnd nodeThe edge between is a rectangular frameAnd a rectangular frameIoU value in between:
defining a multilayer graph convolution neural network, wherein the number of layers of the network is marked as L, and the operation of convolution of each layer of graph is marked as Hl+1=σ(AHlWl),HlIs a feature matrix of the l-th layer, WlIs the weight of the l-th layerThe matrix, σ, is the activation function,
after the graph P has undergone the multilayer graph convolution operation, a graph P ' results, for each node v ' in the graph P 'iIf it corresponds to the rectangular frame biE, B', the category of which is set to be 1, indicates that the rectangular box corresponding to the node needs to be reserved, otherwise, the category of which is set to be 0, indicates that the rectangular box corresponding to the node does not need to be reserved; and (3) calculating cross-entropy loss of each node in the graph G' by using a softmax function, and training the graph convolution neural network model by using an optimization function until the model converges to obtain the trained graph convolution neural network model.
Step 2, prediction phase
For each detected picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnB is constructed as a graph, which is marked as a graph P1The construction mode is the same as that in the step 1 (2); using a trained graph convolution neural network model, for graph P1Forward derivation is performed to obtain a map P'1(ii) a For graph G'1Any node v 'of'iIf the score with the prediction category of 1 is larger than the preset threshold value, the rectangular frame b corresponding to the node is reservediAll the reserved rectangular box sets are the final output results.
Preferably, the activation function is a ReLU function.
Preferably, the optimization function uses an SGD optimization function or an Adam optimization function.
Compared with the prior art, one of the technical schemes has the following beneficial effects: by using graph convolution operation instead of operation, a preset threshold is not needed, and the post-processing performance of the target detection model can be greatly improved by using the characteristic information of the rectangular frame and the context information of the rectangular frame.
Detailed Description
In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail below. All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
The terms "step 1," "step 2," "step 3," and the like in the description and claims of this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be practiced in sequences other than those described herein.
In a first aspect: the embodiment of the disclosure provides a target detection post-processing method based on graph convolution, which is characterized in that based on a target detection model of a deep convolution neural network, a picture is sequentially processed by a backbone module, a tack module and a head module, the type and the position predicted by the model are output, the position is described in a rectangular frame mode, and a final position result is obtained by processing an output rectangular frame set by graph convolution;
the method specifically comprises the following steps:
step 1, training stage: training to obtain a graph convolution neural network model;
(1) screening out the best matching prediction box;
for each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnAnd recording a set of rectangular frames of all real targets in the picture as G-G1,g2,…,gm}。
Constructing a weighted bipartite graph using a set B and a set G, the vertex B in the set BiAnd vertex G in set GiThe weight of the inter-connecting line is defined as: biAnd gjThe IoU value in between, i ∈ {1,2, … … n }, j ∈ {1,2, … … m }.
The KM algorithm (Kuhn-Munkres algorithm) is used to solve the best match so that the match is madeThe weight value of the outcome is the largest. In the best matching result, the element set belonging to the set B is denoted as B '═ B'1,b'2,…,b'rR is less than or equal to m, and the set B' solved by the KM algorithm is the optimal result of the nms post-processing algorithm.
(2) Predicting a best matching rectangular box set B' by using graph convolution;
for each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnSet B itself can also be constructed as a graph P with a set of nodes denoted V ═ V1,v2,…,vnThe number of the elements of the set V is consistent with that of the elements of the set B, and the number of the elements of the set V is any oneiThe initial characteristic vector of the target detection model is used as a prediction rectangular frame b of the target detection modeliThe corresponding feature vector of (2). All nodes in the graph P form a feature matrix H, and H belongs to Rn×pAnd n is the number of the nodes, and p is the dimension of the feature vector of the target detection model for predicting the position of the rectangular frame.
The nodes in the graph P are connected to each other two by two, and the set E ═ E of the edges that constitute the graph P1,e2,…,ekWhere k is n2. The adjacency matrix A, A ∈ R describing the graph Pn×nAdjacent to elements in matrix ARepresentative nodeAnd nodeThe edge between is a rectangular frameAnd a rectangular frameIoU value in between:
defining a multilayer graph convolution neural network, wherein the number of layers of the network is marked as L, and the operation of convolution of each layer of graph is marked as Hl+1=σ(AHlWl),HlIs a feature matrix of the l-th layer, WlAs a weight matrix of the l-th layer, σ is an activation function, and a ReLU function is generally used.
After the graph P is subjected to the multi-layer graph convolution operation, a graph P' is obtained. V 'for each node in graph P'iIf it corresponds to the rectangular frame biE, B', the category of which is set to 1 indicates that the rectangular box corresponding to the node needs to be reserved, otherwise, the category of which is set to 0 indicates that the rectangular box corresponding to the node does not need to be reserved. And (3) calculating cross-entropy loss of each node in the graph G' by using a softmax function, and training the graph convolution neural network model by using an optimization function until the model converges to obtain the trained graph convolution neural network model. Preferably, the optimization function is an SGD optimization function or an Adam optimization function, etc., although other optimization functions are possible.
Step 2, prediction phase
For each detected picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnB is constructed as a graph, which is marked as a graph P1The construction method is the same as that in step 1 (2). Using a trained graph convolution neural network model, for graph P1Forward derivation is performed to obtain a map P'1. For graph G'1Any node v 'of'iIf the score with the prediction category of 1 is larger than the preset threshold value, the rectangular frame b corresponding to the node is reservediAll the reserved rectangular box sets are the final output results.
In a second aspect, the embodiment of the present disclosure provides a target detection post-processing apparatus based on graph convolution
Based on the same technical concept, the device can realize or execute a target detection post-processing method based on graph convolution in any one of all possible implementation modes.
The target detection post-processing device based on graph convolution and the target detection post-processing method based on graph convolution provided by the embodiment belong to the same concept, and specific implementation processes are detailed in the method embodiment and are not described herein again.
The invention has been described above by way of example, it is obvious that the specific implementation of the invention is not limited by the above-described manner, and that various insubstantial modifications are possible using the method concepts and technical solutions of the invention; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.
Claims (4)
1. A target detection post-processing method based on graph convolution is characterized by comprising the following steps:
step 1, training stage: training to obtain a graph convolution neural network model;
(1) screening out the best matching prediction box;
for each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnAnd recording a set of rectangular frames of all real targets in the picture as G-G1,g2,…,gm};
Constructing a weighted bipartite graph using a set B and a set G, the vertex B in the set BiAnd vertex G in set GiThe weight of the inter-connecting line is defined as: biAnd gjIoU (b) in betweeni,gj) The value, i ∈ {1,2, … … n }, j ∈ {1,2, … … m };
solving for best match using KM algorithmThe weight value of the matching result is maximized, and the element set belonging to the set B is denoted as B ' ═ B ' in the best matching result '1,b'2,…,b'r},r≤m;
(2) Predicting a best matching rectangular box set B' by using graph convolution;
for each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnSet B itself can also be constructed as a graph P with a set of nodes denoted V ═ V1,v2,…,vnThe number of the elements of the set V is consistent with that of the elements of the set B, and the number of the elements of the set V is any oneiThe initial characteristic vector of the target detection model is used as a prediction rectangular frame b of the target detection modeliThe corresponding feature vector of (2); all nodes in the graph P form a feature matrix H, and H belongs to Rn×pWherein n is the number of nodes, and p is the dimension of the feature vector of the target detection model prediction rectangular frame position;
the nodes in the graph P are connected to each other two by two, and the set E ═ E of the edges that constitute the graph P1,e2,…,ekWhere k is n2(ii) a The adjacency matrix A, A ∈ R describing the graph Pn×nAdjacent to elements in matrix ARepresentative nodeAnd nodeThe edge between is a rectangular frameAnd a rectangular frameIoU value in between:
defining a multilayer graph convolution neural network, wherein the number of layers of the network is marked as L, and the operation of convolution of each layer of graph is marked as Hl+1=σ(AHlWl),HlIs a feature matrix of the l-th layer, WlIs the weight matrix of the l-th layer, σ is the activation function,
after the graph P has undergone the multilayer graph convolution operation, a graph P ' results, for each node v ' in the graph P 'iIf it corresponds to the rectangular frame biE, B', the category of which is set to be 1, indicates that the rectangular box corresponding to the node needs to be reserved, otherwise, the category of which is set to be 0, indicates that the rectangular box corresponding to the node does not need to be reserved; calculating cross-entropy loss of each node in the graph G' by using a softmax function, training a graph convolution neural network model by using an optimization function until the model converges, and obtaining a trained graph convolution neural network model;
step 2, prediction phase
For each detected picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is marked as B ═ B1,b2,…,bnB is constructed as a graph, which is marked as a graph P1The construction mode is the same as that in the step 1 (2); using a trained graph convolution neural network model, for graph P1Forward derivation is performed to obtain a map P'1(ii) a For graph G'1Any node v 'of'iIf the score with the prediction category of 1 is larger than the preset threshold value, the rectangular frame b corresponding to the node is reservediAll the reserved rectangular box sets are the final output results.
2. The method of claim 1, wherein the activation function is a ReLU function.
3. The graph convolution-based target detection post-processing method according to claim 2, wherein the optimization function uses an SGD optimization function or an Adam optimization function.
4. A graph convolution-based object detection post-processing device, which is characterized by being capable of implementing or executing a graph convolution-based object detection post-processing method according to any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111536248.6A CN114359627B (en) | 2021-12-15 | 2021-12-15 | Target detection post-processing method and device based on graph convolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111536248.6A CN114359627B (en) | 2021-12-15 | 2021-12-15 | Target detection post-processing method and device based on graph convolution |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114359627A true CN114359627A (en) | 2022-04-15 |
CN114359627B CN114359627B (en) | 2024-06-07 |
Family
ID=81099177
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111536248.6A Active CN114359627B (en) | 2021-12-15 | 2021-12-15 | Target detection post-processing method and device based on graph convolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114359627B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111126472A (en) * | 2019-12-18 | 2020-05-08 | 南京信息工程大学 | Improved target detection method based on SSD |
CN111612051A (en) * | 2020-04-30 | 2020-09-01 | 杭州电子科技大学 | Weak supervision target detection method based on graph convolution neural network |
CN112733680A (en) * | 2020-12-31 | 2021-04-30 | 南京视察者智能科技有限公司 | Model training method, extracting method and device for generating high-quality face image based on monitoring video stream and terminal equipment |
CN112884064A (en) * | 2021-03-12 | 2021-06-01 | 迪比(重庆)智能科技研究院有限公司 | Target detection and identification method based on neural network |
WO2021139069A1 (en) * | 2020-01-09 | 2021-07-15 | 南京信息工程大学 | General target detection method for adaptive attention guidance mechanism |
CN113255615A (en) * | 2021-07-06 | 2021-08-13 | 南京视察者智能科技有限公司 | Pedestrian retrieval method and device for self-supervision learning |
WO2021227091A1 (en) * | 2020-05-15 | 2021-11-18 | 南京智谷人工智能研究院有限公司 | Multi-modal classification method based on graph convolutional neural network |
-
2021
- 2021-12-15 CN CN202111536248.6A patent/CN114359627B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111126472A (en) * | 2019-12-18 | 2020-05-08 | 南京信息工程大学 | Improved target detection method based on SSD |
WO2021139069A1 (en) * | 2020-01-09 | 2021-07-15 | 南京信息工程大学 | General target detection method for adaptive attention guidance mechanism |
CN111612051A (en) * | 2020-04-30 | 2020-09-01 | 杭州电子科技大学 | Weak supervision target detection method based on graph convolution neural network |
WO2021227091A1 (en) * | 2020-05-15 | 2021-11-18 | 南京智谷人工智能研究院有限公司 | Multi-modal classification method based on graph convolutional neural network |
CN112733680A (en) * | 2020-12-31 | 2021-04-30 | 南京视察者智能科技有限公司 | Model training method, extracting method and device for generating high-quality face image based on monitoring video stream and terminal equipment |
CN112884064A (en) * | 2021-03-12 | 2021-06-01 | 迪比(重庆)智能科技研究院有限公司 | Target detection and identification method based on neural network |
CN113255615A (en) * | 2021-07-06 | 2021-08-13 | 南京视察者智能科技有限公司 | Pedestrian retrieval method and device for self-supervision learning |
Also Published As
Publication number | Publication date |
---|---|
CN114359627B (en) | 2024-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109215034B (en) | Weak supervision image semantic segmentation method based on spatial pyramid covering pooling | |
CN108038846A (en) | Transmission line equipment image defect detection method and system based on multilayer convolutional neural networks | |
CN107133943A (en) | A kind of visible detection method of stockbridge damper defects detection | |
CN108334881B (en) | License plate recognition method based on deep learning | |
CN111738315A (en) | Image classification method based on countermeasure fusion multi-source transfer learning | |
CN111639564B (en) | Video pedestrian re-identification method based on multi-attention heterogeneous network | |
CN106407986A (en) | Synthetic aperture radar image target identification method based on depth model | |
CN109034184B (en) | Grading ring detection and identification method based on deep learning | |
CN101169867B (en) | Image dividing method, image processing apparatus and system | |
CN111931686B (en) | Video satellite target tracking method based on background knowledge enhancement | |
CN113657560A (en) | Weak supervision image semantic segmentation method and system based on node classification | |
CN109685772B (en) | No-reference stereo image quality evaluation method based on registration distortion representation | |
WO2024055948A1 (en) | Improved unsupervised remote-sensing image abnormality detection method | |
CN113128308B (en) | Pedestrian detection method, device, equipment and medium in port scene | |
CN109444604A (en) | A kind of DC/DC converter method for diagnosing faults based on convolutional neural networks | |
CN114511627A (en) | Target fruit positioning and dividing method and system | |
CN111563542A (en) | Automatic plant classification method based on convolutional neural network | |
CN113920062A (en) | Infrared thermal imaging power equipment fault detection method | |
CN113627481A (en) | Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens | |
CN113989296A (en) | Unmanned aerial vehicle wheat field remote sensing image segmentation method based on improved U-net network | |
CN114359627A (en) | Target detection post-processing method and device based on graph convolution | |
CN110827319A (en) | Improved Staple target tracking method based on local sensitive histogram | |
CN117392065A (en) | Cloud edge cooperative solar panel ash covering condition autonomous assessment method | |
CN115442192B (en) | Communication signal automatic modulation recognition method and device based on active learning | |
CN111666872A (en) | Efficient behavior identification method under data imbalance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |