CN114359627B - Target detection post-processing method and device based on graph convolution - Google Patents

Target detection post-processing method and device based on graph convolution Download PDF

Info

Publication number
CN114359627B
CN114359627B CN202111536248.6A CN202111536248A CN114359627B CN 114359627 B CN114359627 B CN 114359627B CN 202111536248 A CN202111536248 A CN 202111536248A CN 114359627 B CN114359627 B CN 114359627B
Authority
CN
China
Prior art keywords
graph
rectangular frame
target detection
node
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111536248.6A
Other languages
Chinese (zh)
Other versions
CN114359627A (en
Inventor
李军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Inspector Intelligent Technology Co ltd
Original Assignee
Nanjing Inspector Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Inspector Intelligent Technology Co ltd filed Critical Nanjing Inspector Intelligent Technology Co ltd
Priority to CN202111536248.6A priority Critical patent/CN114359627B/en
Publication of CN114359627A publication Critical patent/CN114359627A/en
Application granted granted Critical
Publication of CN114359627B publication Critical patent/CN114359627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a target detection post-processing method and device based on graph convolution, wherein the method specifically comprises the following steps: step 1, training phase: training to obtain a graph convolution neural network model; (1) screening out a prediction frame with the best match; (2) Predicting a best matching set of rectangular boxes using a graph convolution; step 2, prediction stage: and for each detection picture, marking a rectangular frame set which is predicted by the target detection model and filtered by a category score threshold value as B, constructing the B as a picture, using a trained picture convolutional neural network model, if the score of the prediction category 1 is greater than a preset threshold value, reserving rectangular frames corresponding to the node, wherein all reserved rectangular frame sets are final output results. By using the graph convolution operation to replace operation, the post-processing performance of the target detection model can be improved to a great extent by utilizing the characteristic information of the rectangular frame and the context information of the rectangular frame without a preset threshold value.

Description

Target detection post-processing method and device based on graph convolution
Technical Field
The invention relates to the field of image recognition research, in particular to the field of target detection and deep learning, and in particular relates to a target detection post-processing method and device based on graph convolution.
Background
The post-processing stage of the target detection model comprises two steps, namely filtering out a predicted result with a category score lower than a threshold value, and then filtering overlapped rectangular frames by using operation to obtain a final target detection predicted result. Because the operation only uses the position information of the rectangular frames, the operation is easily influenced by a preset threshold value, if the threshold value is too large, a plurality of rectangular frames are easily output on the same target, the accuracy is reduced, and if the threshold value is too small, only one rectangular frame is easily output by two adjacent targets, and the recall rate is reduced.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a target detection post-processing method and device based on graph convolution, which can greatly improve the post-processing performance of a target detection model. The technical proposal is as follows:
The invention provides a target detection post-processing method based on graph convolution, which specifically comprises the following steps:
step 1, training phase: training to obtain a graph convolution neural network model;
(1) Screening out a prediction frame with the best match;
For each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is denoted as b= { B 1,b2,…,bn }, and the set of rectangular frames of all real targets in the picture is denoted as g= { G 1,g2,…,gm }.
Constructing a bipartite graph with weights by using a set B and a set G, wherein the weights of the connecting lines between the vertexes B i in the set B and the vertexes G i in the set G are defined as follows: ioU (b i,gj) values between b i and g j, i e {1,2,...
And solving the optimal matching by using a KM algorithm, so that the weight value of a matching result is maximum, wherein in the optimal matching result, an element set belonging to the set B is marked as B '= { B' 1,b′2,…,b′r }, and r is less than or equal to m.
(2) Predicting a best matching rectangular box set B' using a convolution;
For each training picture, the rectangular frame set predicted by the target detection model and filtered by the category score threshold is denoted as b= { B 1,b2,…,bn }, the set B itself can also be constructed to be a graph denoted as a graph P, the node set in the graph P is denoted as v= { V 1,v2,…,vn }, the number of the set V elements is consistent with the number of the set B elements, and for any element V i in the set V, the initial feature vector is the corresponding feature vector of the target detection model prediction rectangular frame B i. All nodes in the graph P form a feature matrix H, H epsilon R n×p, wherein n is the number of the nodes, and P is the dimension of a feature vector of the position of the target detection model prediction rectangular frame.
Nodes in graph P are connected to each other two by two, forming a set e= { E 1,e2,…,ek }, where k=n 2; adjacent matrix A describing graph P, A epsilon R n×n, elements in adjacent matrix ARepresenting nodes/>Sum node/>The value of the edge between the two is rectangular frame/>And rectangular box/>IoU values in between:
Defining a multi-layer graph convolution neural network, wherein the number of layers of the network is recorded as L, the operation of each layer of graph convolution is defined as H l+1=σ(AHlWl),Hl as a characteristic matrix of a first layer, W l as a weight matrix of the first layer, sigma as an activation function,
After the graph P is subjected to the multi-layer graph convolution operation, a graph P 'is obtained, and for each node v' i in the graph P ', if the corresponding rectangular frame B i epsilon B' is set to be 1, the class of the corresponding rectangular frame is set to be reserved, otherwise, the class of the corresponding rectangular frame is set to be 0, and the rectangular frame corresponding to the node is not required to be reserved; and for each node in the graph P', calculating the cross-entopy loss by using a softmax function, training the graph rolling neural network model by using an optimization function until the model converges, and obtaining the trained graph rolling neural network model.
Step 2, prediction stage
For each detection picture, marking a rectangular frame set predicted by the target detection model and filtered by the category score threshold value as B= { B 1,b2,…,bn }, and marking B as a picture, namely a picture P 1, wherein the construction mode is the same as that in the step 1 (2); forward deriving a graph P 1 by using the trained graph convolutional neural network model to obtain a graph P' 1; for any node v 'i in the graph P' 1, if the score of the prediction class 1 is greater than the preset threshold, reserving the rectangular frames b i corresponding to the node, and all reserved rectangular frame sets are final output results.
Preferably, the activation function employs a ReLU function.
Preferably, the optimization function uses an SGD optimization function or an Adam optimization function.
Compared with the prior art, one of the technical schemes has the following beneficial effects: by using the graph convolution operation to replace operation, the post-processing performance of the target detection model can be improved to a great extent by utilizing the characteristic information of the rectangular frame and the context information of the rectangular frame without a preset threshold value.
Detailed Description
In order to clarify the technical scheme and working principle of the present invention, the following describes the embodiments of the present disclosure in further detail. Any combination of the above-mentioned optional solutions may be adopted to form an optional embodiment of the present disclosure, which is not described herein in detail.
The terms "step 1," "step 2," "step 3," and the like in the description and in the claims are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those described herein.
First aspect: the embodiment of the disclosure provides a target detection post-processing method based on graph convolution, which is based on a target detection model of a deep convolution neural network, sequentially processes pictures through a back box module, a back module and a head module, outputs types and positions predicted by the model, describes the positions in a rectangular frame form, and obtains a final position result by using a rectangular frame set output by graph convolution processing;
The method specifically comprises the following steps:
step 1, training phase: training to obtain a graph convolution neural network model;
(1) Screening out a prediction frame with the best match;
For each training picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is denoted as b= { B 1,b2,…,bn }, and the set of rectangular frames of all real targets in the picture is denoted as g= { G 1,g2,…,gm }.
Constructing a bipartite graph with weights by using a set B and a set G, wherein the weights of the connecting lines between the vertexes B i in the set B and the vertexes G i in the set G are defined as follows: ioU values between b i and g j, i e {1,2, once again, n, j e {1, 2.. Once again.
The best match is solved using the KM algorithm (Kuhn-Munkres algorithm) so that the weight value of the matching result is maximized. In the best matching result, the element set belonging to the set B is marked as B ' = { B ' 1,b′2,…,b′r }, r is less than or equal to m, and the set B ' solved by the KM algorithm is the best result by the nms post-processing algorithm.
(2) Predicting a best matching rectangular box set B' using a convolution;
For each training picture, the rectangular frame set predicted by the target detection model and filtered by the category score threshold is denoted as b= { B 1,b2,…,bn }, the set B itself can also be constructed to be a graph denoted as a graph P, the node set in the graph P is denoted as v= { V 1,v2,…,vn }, the number of the set V elements is consistent with the number of the set B elements, and for any element V i in the set V, the initial feature vector is the corresponding feature vector of the target detection model prediction rectangular frame B i. All nodes in the graph P form a feature matrix H, H epsilon R n×p, wherein n is the number of the nodes, and P is the dimension of a feature vector of the position of the target detection model prediction rectangular frame.
Nodes in graph P are connected to each other two by two, forming a set e= { E 1,e2,…,ek }, where k=n 2, of edges of graph P. Adjacent matrix A describing graph P, A epsilon R n×n, elements in adjacent matrix ARepresenting nodes/>Sum node/>The value of the edge between the two is rectangular frame/>And rectangular box/>IoU values in between:
A multi-layer graph convolution neural network is defined, the number of layers of the network is denoted as L, the operation of each layer of graph convolution is defined as H l+1=σ(AHlWl),Hl being the feature matrix of the first layer, W l being the weight matrix of the first layer, σ being the activation function, and a ReLU function being typically used.
After the multi-layer graph convolution operation, graph P' is obtained. For each node v ' i in the graph P ', if its corresponding rectangular box B i e B ' is set to 1 for its class, it indicates that the rectangular box corresponding to the node needs to be reserved, otherwise, its class is set to 0 for its class, which indicates that the rectangular box corresponding to the node does not need to be reserved. And for each node in the graph P', calculating the cross-entopy loss by using a softmax function, training the graph rolling neural network model by using an optimization function until the model converges, and obtaining the trained graph rolling neural network model. Preferably, the optimization function is an SGD optimization function or Adam optimization function, etc., although other optimization functions are possible.
Step 2, prediction stage
For each detected picture, the set of rectangular frames predicted by the target detection model and filtered by the category score threshold is denoted as b= { B 1,b2,…,bn }, B is constructed as a graph, and is denoted as a graph P 1, and the construction manner is the same as that in step 1 (2). Forward derivation is performed on graph P 1 using the trained graph convolutional neural network model, resulting in graph P' 1. For any node v 'i in the graph P' 1, if the score of the prediction class 1 is greater than the preset threshold, reserving the rectangular frames b i corresponding to the node, and all reserved rectangular frame sets are final output results.
In a second aspect, embodiments of the present disclosure provide a target detection post-processing apparatus based on graph convolution
Based on the same technical concept, the device can realize or execute a target detection post-processing method based on graph convolution in any one of all possible implementation manners.
The target detection post-processing device based on graph convolution and the target detection post-processing method based on graph convolution provided in the foregoing embodiments belong to the same concept, and detailed implementation processes of the target detection post-processing device based on graph convolution are detailed in the method embodiments and are not described herein.
While the invention has been described above by way of example, it is evident that the invention is not limited to the particular embodiments described above, but rather, it is intended to provide various insubstantial modifications, both as to the method concepts and technical solutions of the invention; or the above conception and technical scheme of the invention are directly applied to other occasions without improvement and equivalent replacement, and all are within the protection scope of the invention.

Claims (4)

1. The target detection post-processing method based on graph convolution is characterized by comprising the following steps of:
step 1, training phase: training to obtain a graph convolution neural network model;
(1) Screening out a prediction frame with the best match;
for each training picture, marking a rectangular frame set predicted by the target detection model and filtered by the category score threshold value as B= { B 1,b2,…,bn }, and marking rectangular frame sets of all real targets in the picture as G= { G 1,g2,…,gm };
Constructing a bipartite graph with weights by using a set B and a set G, wherein the weights of the connecting lines between the vertexes B i in the set B and the vertexes G j in the set G are defined as follows: ioU (b i,gj) values between b i and g j, i e {1,2, … … n }, j e {1,2, … … m };
Solving the optimal matching by using a KM algorithm, so that the weight value of a matching result is maximum, wherein in the optimal matching result, an element set belonging to a set B is marked as B '= { B' 1,b′2,…,b′r }, and r is less than or equal to m;
(2) Predicting a best matching rectangular box set B' using a convolution;
For each training picture, the rectangular frame set predicted by the target detection model and filtered by the category score threshold is marked as B= { B 1,b2,…,bn }, the set B is self-structured to be marked as a graph P, the node set in the graph P is marked as V= { V 1,v2,…,vn }, the number of the elements in the set V is consistent with the number of the elements in the set B, and for any element V i in the set V, the initial feature vector is the corresponding feature vector of the target detection model prediction rectangular frame B i; all nodes in the graph P form a feature matrix H, H epsilon R n×p, wherein n is the number of the nodes, and P is the dimension of a feature vector of the position of the target detection model prediction rectangular frame;
nodes in graph P are connected to each other two by two, forming a set e= { E 1,e2,…,ek }, where k=n 2; adjacent matrix A describing graph P, A epsilon R n×n, elements in adjacent matrix A Representing nodes/>Sum node/>The value of the edge between the two is rectangular frame/>And rectangular box/>IoU values in between:
Defining a multi-layer graph convolution neural network, wherein the number of layers of the network is recorded as L, the operation of each layer of graph convolution is defined as H l+1=σ(AHlWl),Hl as a characteristic matrix of a first layer, W l as a weight matrix of the first layer, sigma as an activation function,
After the graph P is subjected to the multi-layer graph convolution operation, a graph P 'is obtained, and for each node v' i in the graph P ', if the corresponding rectangular frame B i epsilon B' is set to be 1, the class of the corresponding rectangular frame is set to be reserved, otherwise, the class of the corresponding rectangular frame is set to be 0, and the rectangular frame corresponding to the node is not required to be reserved; calculating cross-entopy loss by using a softmax function for each node in the graph P', training a graph rolling neural network model by using an optimization function until the model converges, and obtaining a trained graph rolling neural network model;
Step 2, prediction stage
For each detection picture, marking a rectangular frame set predicted by the target detection model and filtered by the category score threshold value as B= { B 1,b2,…,bn }, and marking B as a picture, namely a picture P 1, wherein the construction mode is the same as that in the step 1 (2); forward deriving a graph P 1 by using the trained graph convolutional neural network model to obtain a graph P' 1; for any node v 'i in the graph P' 1, if the score of the prediction class 1 is greater than the preset threshold, reserving the rectangular frames b i corresponding to the node, and all reserved rectangular frame sets are final output results.
2. The method of claim 1, wherein the activation function is a ReLU function.
3. The target detection post-processing method based on graph convolution according to claim 2, wherein the optimization function uses an SGD optimization function or an Adam optimization function.
4. A target detection post-processing device based on graph convolution, characterized in that the device implements or performs a target detection post-processing method based on graph convolution as claimed in any one of claims 1-3.
CN202111536248.6A 2021-12-15 2021-12-15 Target detection post-processing method and device based on graph convolution Active CN114359627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111536248.6A CN114359627B (en) 2021-12-15 2021-12-15 Target detection post-processing method and device based on graph convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111536248.6A CN114359627B (en) 2021-12-15 2021-12-15 Target detection post-processing method and device based on graph convolution

Publications (2)

Publication Number Publication Date
CN114359627A CN114359627A (en) 2022-04-15
CN114359627B true CN114359627B (en) 2024-06-07

Family

ID=81099177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111536248.6A Active CN114359627B (en) 2021-12-15 2021-12-15 Target detection post-processing method and device based on graph convolution

Country Status (1)

Country Link
CN (1) CN114359627B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126472A (en) * 2019-12-18 2020-05-08 南京信息工程大学 Improved target detection method based on SSD
CN111612051A (en) * 2020-04-30 2020-09-01 杭州电子科技大学 Weak supervision target detection method based on graph convolution neural network
CN112733680A (en) * 2020-12-31 2021-04-30 南京视察者智能科技有限公司 Model training method, extracting method and device for generating high-quality face image based on monitoring video stream and terminal equipment
CN112884064A (en) * 2021-03-12 2021-06-01 迪比(重庆)智能科技研究院有限公司 Target detection and identification method based on neural network
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
CN113255615A (en) * 2021-07-06 2021-08-13 南京视察者智能科技有限公司 Pedestrian retrieval method and device for self-supervision learning
WO2021227091A1 (en) * 2020-05-15 2021-11-18 南京智谷人工智能研究院有限公司 Multi-modal classification method based on graph convolutional neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126472A (en) * 2019-12-18 2020-05-08 南京信息工程大学 Improved target detection method based on SSD
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
CN111612051A (en) * 2020-04-30 2020-09-01 杭州电子科技大学 Weak supervision target detection method based on graph convolution neural network
WO2021227091A1 (en) * 2020-05-15 2021-11-18 南京智谷人工智能研究院有限公司 Multi-modal classification method based on graph convolutional neural network
CN112733680A (en) * 2020-12-31 2021-04-30 南京视察者智能科技有限公司 Model training method, extracting method and device for generating high-quality face image based on monitoring video stream and terminal equipment
CN112884064A (en) * 2021-03-12 2021-06-01 迪比(重庆)智能科技研究院有限公司 Target detection and identification method based on neural network
CN113255615A (en) * 2021-07-06 2021-08-13 南京视察者智能科技有限公司 Pedestrian retrieval method and device for self-supervision learning

Also Published As

Publication number Publication date
CN114359627A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
Suganuma et al. Attention-based adaptive selection of operations for image restoration in the presence of unknown combined distortions
CN110532859B (en) Remote sensing image target detection method based on deep evolution pruning convolution net
CN110245655B (en) Single-stage object detection method based on lightweight image pyramid network
CN109190695B (en) Fish image classification method based on deep convolutional neural network
TWI715888B (en) Auto insurance image processing method, device, computer readable storage medium and computing equipment
CN110569875B (en) Deep neural network target detection method based on feature multiplexing
JP2020126624A (en) Method for recognizing face using multiple patch combination based on deep neural network and improving fault tolerance and fluctuation robustness
Lin et al. Ru-net: Regularized unrolling network for scene graph generation
CN111476248B (en) CNN method and device using 1x1 convolution for identifying image
JP2018175226A5 (en)
CN111476247B (en) CNN method and device using 1xK or Kx1 convolution operation
JP6858382B2 (en) A method and learning device for learning a CNN-based object detector using 1xH convolution used for hardware optimization, and a test method and test device using it.
CN111476342B (en) CNN method and device using 1xH convolution
CN107679539A (en) A kind of single convolutional neural networks local message wild based on local sensing and global information integration method
Yap et al. A recursive soft-decision approach to blind image deconvolution
CN112598062A (en) Image identification method and device
CN115661777A (en) Semantic-combined foggy road target detection algorithm
JP6935868B2 (en) Image recognition device, image recognition method, and program
CN114359627B (en) Target detection post-processing method and device based on graph convolution
CN114359167A (en) Insulator defect detection method based on lightweight YOLOv4 in complex scene
CN112329818B (en) Hyperspectral image non-supervision classification method based on graph convolution network embedded characterization
CN111310516A (en) Behavior identification method and device
CN113627481A (en) Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens
CN111476075B (en) CNN-based object detection method and device by using 1x1 convolution
CN111079585A (en) Image enhancement and pseudo-twin convolution neural network combined pedestrian re-identification method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant