CN117893838A - Target detection method using diffusion detection model - Google Patents

Target detection method using diffusion detection model Download PDF

Info

Publication number
CN117893838A
CN117893838A CN202410288788.4A CN202410288788A CN117893838A CN 117893838 A CN117893838 A CN 117893838A CN 202410288788 A CN202410288788 A CN 202410288788A CN 117893838 A CN117893838 A CN 117893838A
Authority
CN
China
Prior art keywords
dimensional
frame
low
noise
diffusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410288788.4A
Other languages
Chinese (zh)
Inventor
曹刘娟
罗耀钦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202410288788.4A priority Critical patent/CN117893838A/en
Publication of CN117893838A publication Critical patent/CN117893838A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a target detection method applying a diffusion detection model, which can realize the improvement of the precision of the diffusion detection model and comprises the following steps: 1. acquiring an input image, and extracting an image feature map of the input image through an image feature extractor; 2. obtaining a low-dimensional truth box, encoding the low-dimensional truth box through a boundary box encoder, mapping the low-dimensional space to a high-dimensional space, and obtaining the high-dimensional truth box; 3. gradually adding Gaussian noise to the high-dimensional truth frame according to a rule of adding noise to the diffusion detection model to obtain a high-dimensional noise frame; 4. decoding the high-dimensional noise frame through a boundary frame decoder, and mapping the high-dimensional space back to the low-dimensional space before encoding to obtain the low-dimensional noise frame; 5. and intercepting the RoI features from the image feature images extracted by the image feature extractor by using the low-dimensional noise frame, inputting the intercepted RoI features and the low-dimensional noise frame into a detection head, carrying out regression and classification, and predicting the position and the target category of the corresponding low-dimensional truth frame.

Description

Target detection method using diffusion detection model
Technical Field
The invention relates to the technical field of target detection, in particular to a target detection method using a diffusion detection model.
Background
Object detection is an important task in the field of computer vision, aimed at identifying objects in an image and determining their position. The traditional target detection algorithm mainly relies on manual feature extraction, and is large in calculation amount and unstable. With the rise of deep learning, a target detection algorithm based on deep learning gradually becomes a research hotspot. However, existing target detection algorithms still face challenges such as different appearances, shapes and attitudes of objects, and interference from factors such as illumination, occlusion, etc. At present, a diffusion model has remarkable effect in the field of image generation, and random noise can be gradually converted into a clear image through a denoising diffusion process. In light of this, application of a diffusion model to the field of object detection is a new attempt.
The DiffusionDet model initially attempts to apply a diffusion model to the target detection task, modeling the target detection as a denoising diffusion process from noise box to target box. Specifically, during the training phase, the target boxes diffuse from the truth boxes to random distribution, and the model learns how to reverse the process of adding noise to the truth boxes; in the inference phase, the model refines a set of randomly generated target boxes in a progressive manner into output results. Although the DiffusionDet model exhibits excellent performance, it ignores that the diffusion model is generally used for an image generation task, a subject of diffusion in the image generation task is generally an image with a higher dimension, and a diffusion subject in the DiffusionDet model is a detection frame with a lower dimension, so that information which can be contained in the DiffusionDet model in the diffusion process is limited, the advantage of the diffusion model cannot be fully exerted, and further improvement of the performance of the DiffusionDet model is limited. In addition, the Diffuse det model adopts a structure that a plurality of detection heads are simply connected in series in the detection stage, and the effect of the area related characteristics is not considered.
Therefore, how to provide a diffusion detection model based on frame coding to improve the accuracy of the diffusion detection model is a technical problem to be solved.
Disclosure of Invention
The invention aims to provide a target detection method using a diffusion detection model, solve the problems existing in the prior art and realize the improvement of the precision of the diffusion detection model.
In order to achieve the above object, the solution of the present invention is:
an object detection method using a diffusion detection model, comprising the steps of:
step 1, acquiring an input image, and extracting an image feature map of the input image through an image feature extractor;
step 2, obtaining a low-dimensional truth box, encoding the low-dimensional truth box through a boundary box encoder, mapping the low-dimensional space to a high-dimensional space, and obtaining the high-dimensional truth box;
step 3, gradually adding Gaussian noise to the high-dimensional truth frame according to a rule of adding noise to the diffusion detection model to obtain a high-dimensional noise frame;
step 4, decoding the high-dimensional noise frame through a boundary frame decoder, and mapping the high-dimensional space back to the low-dimensional space before encoding to obtain the low-dimensional noise frame;
step 5, intercepting RoI features from the image feature images extracted by the image feature extractor by using the low-dimensional noise frames, inputting the intercepted RoI features and the low-dimensional noise frames into a detection head together, carrying out regression and classification, and predicting the position and the target category of the corresponding low-dimensional truth frames;
the detection head is of a cascade structure and consists of 4 cascade stages, each stage receives an image characteristic image, one of a noise frame and a prediction frame as input, outputs the prediction frame, and the last detection head also outputs a prediction type; in each stage, the RoIAlignon operation is utilized to extract RoI features for the image feature map/noise frame/prediction frame, and then the prediction frame is generated based on the extracted RoI features; the RoI features extracted in the last stage are additionally weighted and fused with the RoI features extracted in other stages and then are used for predicting frame regression and classification results.
In the step 1, the image feature map is extracted through a ResNet model or a Res2Net model.
The step 2 further comprises the step of obtaining a low-dimensional trueAfter the value frames, if the number of low-dimensional truth frames of the input image is less than the specified valueThe number of detection frames of the diffusion detection model is filled to a prescribed value +.>
In the step 2, the boundary box encoder is implemented by a multi-layer perceptron.
The step 3 is specifically that according to a given time step lengthAnd noise schedule, sampling time step size +.>Samples at any one time step.
In the step 4, the boundary box decoder is implemented by a multi-layer perceptron.
And step 4, firstly generating a high-dimensional random frame with the same dimension as the high-dimensional noise frame during training during reasoning, and decoding the high-dimensional random frame of the high-dimensional space into the low-dimensional space through a boundary decoder.
After the technical scheme is adopted, the invention has the following technical effects:
the invention is beneficial to improving the capability of capturing information in the low-dimensional truth frame diffusion process and accelerating the process of diffusing the low-dimensional truth frame to Gaussian distribution by introducing the boundary frame encoder and the boundary frame decoder; and the RoI features of other stages are fused in the last stage of the detection head, so that the reasonable utilization of the area related information is facilitated, and the prediction accuracy is improved.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention;
FIG. 2 is a schematic diagram comparing the prior art (upper) with the present invention (lower);
fig. 3 is a schematic structural diagram of a detection head according to an embodiment of the present invention.
Detailed Description
In order to further explain the technical scheme of the invention, the invention is explained in detail by specific examples.
Referring to fig. 1 to 3, the present invention discloses a target detection method using a diffusion detection model, comprising the steps of:
step 1, acquiring an input image, and extracting an image feature map of the input image through an image feature extractor;
step 2, obtaining a low-dimensional truth box, encoding the low-dimensional truth box through a boundary box encoder, mapping the low-dimensional space to a high-dimensional space, and obtaining the high-dimensional truth box;
step 3, gradually adding Gaussian noise to the high-dimensional truth frame according to a rule of adding noise to the diffusion detection model to obtain a high-dimensional noise frame;
step 4, decoding the high-dimensional noise frame through a boundary frame decoder, and mapping the high-dimensional space back to the low-dimensional space before encoding to obtain the low-dimensional noise frame;
and 5, intercepting RoI features from the image feature images extracted by the image feature extractor by using the low-dimensional noise frame, inputting the intercepted RoI features and the low-dimensional noise frame into a detection head, carrying out regression and classification, and predicting the position and the target category of the corresponding low-dimensional truth frame.
The diffusion detection model in the prior art ignores the dimension of a diffusion main body, can not fully capture effective information in the diffusion process, and can not fully utilize relevant region information in the detection process, so that the invention provides a boundary frame encoder and a boundary frame decoder which are constructed to respectively encode and decode the boundary frame before and after diffusion, so that the boundary frame can be fully diffused in a high-dimensional space and can be aligned in a low-dimensional space; the high-dimensional detection frame can be randomly initialized in the reasoning process, and mapped to a low-dimensional space through a boundary frame decoder; during detection, the RoI features of a plurality of cascade stages are fused by introducing a feature fusion mechanism, so that the precision of final classification and regression prediction is improved.
Specific embodiments of the invention are shown below.
In the step 1, the following steps:
acquiring an input imageExtracting image features->,/>
Wherein the method comprises the steps of,/>Image feature extractor->Is ResNet model or Res2Net model, +.>For inputting the height of the image +.>For inputting the width of the image->Representing an image consisting of three primary colors red, green and blue, < >>For image features->Is a number of channels.
The step 2 further includes:
acquiring an input imageCorresponding low-dimensional truth box set +.>,/>,/>For inputting images +.>The number of low-dimensional truth boxes in (a); setting the target number in any one input image to +.>,/>Is equal to or greater than the maximum value of the number of objects in any input image (in this embodiment +.>Value 300), if->Then by filling in and inputting the image->The number of detection frames of the diffusion detection model is filled to a prescribed value +.>At this time->
In the step 2, the bounding box encoder is implemented by a multi-layer perceptron, and the calculation formula is as follows:
wherein,representing the encoded high-dimensional truth box, +.>Is the dimension of the high-dimension truth box (value 128 in this embodiment), +.>Representing a multi-layer perceptron (hereinafter the same).
The step 3 specifically comprises the following steps:
according to a given time stepAnd noise schedule, sampling time step size +.>Samples at any one time step. The noise adding process can be regarded as a markov process given a time step size +.>According to the re-parameterization technique, the time step size +.>Interior->The calculation formula of the input noise box at each moment is as follows:
wherein,is from->Linearly increase to +.>Is used for controlling the size of noise; />And->Is to represent an intermediate variable that is convenient to set; along with->Is increased by (1)>Gradually increase, correspondingly->Gradually becoming smaller; />Representation->Noise box of moment->The method comprises the steps of carrying out a first treatment on the surface of the In this embodiment->1000.
In the step 4, the bounding box decoder is implemented by the multi-layer perceptron, and the calculation formula is as follows:
wherein,representing the decoded set of low dimensional noise boxes.
The step 4 further includes generating a high-dimensional random frame with the same dimension as the high-dimensional noise frame during training during reasoning, decoding the high-dimensional random frame of the high-dimensional space into the low-dimensional space through a boundary decoder, and the calculation formula is as follows:
wherein,high-dimensional random box representing a gaussian distribution randomly generated at the time of reasoning, +.>Representing the low dimensional noise box decoded by the bounding box decoder.
The step 5 specifically comprises the following steps:
the detection head consists of 4 cascaded stages, each stage receives one of the image feature map, the noise frame and the prediction frame as input, and outputs the prediction frame, and the last detection head also outputs the prediction category; in each stage, the RoIAlignon operation is utilized to extract RoI features for the image feature map/noise frame/prediction frame, and then the prediction frame is generated based on the extracted RoI features; in order to fully utilize the region related features, the RoI features extracted in the last stage are additionally subjected to weighted fusion with the RoI features extracted in other stages and then are used for predicting frame regression and classification results. The entire detection flow can be expressed by the following formula:
wherein,indicating the first detection headiThe RoI features extracted in the individual stages; />Represent the firstiStage 1jThe RoI features corresponding to the proposal boxes; />Representing the RoIAlign operation; />A prediction frame set representing the i-th stage output; />Represent the firstiThe first of the detection headsjOutput frames corresponding to the input frames;/>representing the probability that the object in each output box belongs to the respective category,/for each output box>The number of categories (80 in this embodiment); />Representing a fully connected layer.
And (3) experimental verification:
the experiment is carried out on the CoCo data set, and compared with the prior method, the comparison result of the experiment on the CoCo data set is shown in the following table, and compared with the prior method, the performance of the experiment is remarkably improved, and compared with Diffuse det, the experiment is advanced in various indexes, wherein on the comparison with ResNet-50 as a main body, the experiment respectively obtains 1.3% and 2.3% improvement on the AP and the AP50, and the improvement of the performance is illustrated.
The above examples and drawings are not intended to limit the form or form of the present invention, and any suitable variations or modifications thereof by those skilled in the art should be construed as not departing from the scope of the present invention.

Claims (7)

1. The target detection method using the diffusion detection model is characterized by comprising the following steps:
step 1, acquiring an input image, and extracting an image feature map of the input image through an image feature extractor;
step 2, obtaining a low-dimensional truth box, encoding the low-dimensional truth box through a boundary box encoder, mapping the low-dimensional space to a high-dimensional space, and obtaining the high-dimensional truth box;
step 3, gradually adding Gaussian noise to the high-dimensional truth frame according to a rule of adding noise to the diffusion detection model to obtain a high-dimensional noise frame;
step 4, decoding the high-dimensional noise frame through a boundary frame decoder, and mapping the high-dimensional space back to the low-dimensional space before encoding to obtain the low-dimensional noise frame;
step 5, intercepting RoI features from the image feature images extracted by the image feature extractor by using the low-dimensional noise frames, inputting the intercepted RoI features and the low-dimensional noise frames into a detection head together, carrying out regression and classification, and predicting the position and the target category of the corresponding low-dimensional truth frames;
the detection head is of a cascade structure and consists of 4 cascade stages, each stage receives an image characteristic image, one of a noise frame and a prediction frame as input, outputs the prediction frame, and the last detection head also outputs a prediction type; in each stage, the RoIAlignon operation is utilized to extract RoI features for the image feature map/noise frame/prediction frame, and then the prediction frame is generated based on the extracted RoI features; the RoI features extracted in the last stage are additionally weighted and fused with the RoI features extracted in other stages and then are used for predicting frame regression and classification results.
2. The target detection method using a diffusion detection model according to claim 1, wherein:
in the step 1, the image feature map is extracted through a ResNet model or a Res2Net model.
3. The target detection method using a diffusion detection model according to claim 1, wherein:
step 2 further comprises, after the low-dimensional truth boxes are obtained, if the number of the low-dimensional truth boxes of the input image is less than a specified valueThe number of detection frames of the diffusion detection model is filled to a prescribed value +.>
4. A target detection method using a diffusion detection model according to claim 1 or 3, wherein:
in the step 2, the boundary box encoder is implemented by a multi-layer perceptron.
5. The target detection method using a diffusion detection model according to claim 1, wherein:
the step 3 is specifically that according to a given time step lengthAnd noise schedule, sampling time step size +.>Samples at any one time step.
6. The target detection method using a diffusion detection model according to claim 1, wherein:
in the step 4, the boundary box decoder is implemented by a multi-layer perceptron.
7. The target detection method using a diffusion detection model according to claim 1 or 4, wherein:
and step 4, firstly generating a high-dimensional random frame with the same dimension as the high-dimensional noise frame during training during reasoning, and decoding the high-dimensional random frame of the high-dimensional space into the low-dimensional space through a boundary decoder.
CN202410288788.4A 2024-03-14 2024-03-14 Target detection method using diffusion detection model Pending CN117893838A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410288788.4A CN117893838A (en) 2024-03-14 2024-03-14 Target detection method using diffusion detection model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410288788.4A CN117893838A (en) 2024-03-14 2024-03-14 Target detection method using diffusion detection model

Publications (1)

Publication Number Publication Date
CN117893838A true CN117893838A (en) 2024-04-16

Family

ID=90649102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410288788.4A Pending CN117893838A (en) 2024-03-14 2024-03-14 Target detection method using diffusion detection model

Country Status (1)

Country Link
CN (1) CN117893838A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837190A (en) * 2021-08-30 2021-12-24 厦门大学 End-to-end instance segmentation method based on Transformer
CN116485682A (en) * 2023-05-04 2023-07-25 北京联合大学 Image shadow removing system and method based on potential diffusion model
CN117236390A (en) * 2023-09-22 2023-12-15 西南石油大学 Reservoir prediction method based on cross attention diffusion model
CN117292007A (en) * 2023-09-28 2023-12-26 支付宝(杭州)信息技术有限公司 Image generation method and device
CN117315263A (en) * 2023-11-28 2023-12-29 杭州申昊科技股份有限公司 Target contour segmentation device, training method, segmentation method and electronic equipment
CN117351325A (en) * 2023-12-06 2024-01-05 浙江省建筑设计研究院 Model training method, building effect graph generation method, equipment and medium
CN117496927A (en) * 2024-01-02 2024-02-02 广州市车厘子电子科技有限公司 Music timbre style conversion method and system based on diffusion model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837190A (en) * 2021-08-30 2021-12-24 厦门大学 End-to-end instance segmentation method based on Transformer
CN116485682A (en) * 2023-05-04 2023-07-25 北京联合大学 Image shadow removing system and method based on potential diffusion model
CN117236390A (en) * 2023-09-22 2023-12-15 西南石油大学 Reservoir prediction method based on cross attention diffusion model
CN117292007A (en) * 2023-09-28 2023-12-26 支付宝(杭州)信息技术有限公司 Image generation method and device
CN117315263A (en) * 2023-11-28 2023-12-29 杭州申昊科技股份有限公司 Target contour segmentation device, training method, segmentation method and electronic equipment
CN117351325A (en) * 2023-12-06 2024-01-05 浙江省建筑设计研究院 Model training method, building effect graph generation method, equipment and medium
CN117496927A (en) * 2024-01-02 2024-02-02 广州市车厘子电子科技有限公司 Music timbre style conversion method and system based on diffusion model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHOUFA CHEN , PEIZE SUN , ET AL: "DiffusionDet : Diffusion Model for Object Detection", 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 6 October 2023 (2023-10-06), pages 19773 - 19786, XP034513831, DOI: 10.1109/ICCV51070.2023.01816 *

Similar Documents

Publication Publication Date Title
CN110097131B (en) Semi-supervised medical image segmentation method based on countermeasure cooperative training
CN109508669B (en) Facial expression recognition method based on generative confrontation network
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN108648188B (en) No-reference image quality evaluation method based on generation countermeasure network
CN108804397B (en) Chinese character font conversion generation method based on small amount of target fonts
CN111274921B (en) Method for recognizing human body behaviors by using gesture mask
CN111462261B (en) Fast CU partitioning and intra-frame decision method for H.266/VVC
CN110222837A (en) A kind of the network structure ArcGAN and method of the picture training based on CycleGAN
CN116311483B (en) Micro-expression recognition method based on local facial area reconstruction and memory contrast learning
CN109829924A (en) A kind of image quality evaluating method based on body feature analysis
CN111967358B (en) Neural network gait recognition method based on attention mechanism
CN114511554A (en) Automatic nasopharyngeal carcinoma target area delineating method and system based on deep learning
CN114006870A (en) Network flow identification method based on self-supervision convolution subspace clustering network
CN114154016A (en) Video description method based on target space semantic alignment
CN114333062B (en) Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
CN115410078A (en) Low-quality underwater image fish target detection method
CN111462157A (en) Infrared image segmentation method based on genetic optimization threshold method
CN113807497B (en) Unpaired image translation method for enhancing texture details
CN114066871A (en) Method for training new coronary pneumonia focus region segmentation model
CN116630482B (en) Image generation method based on multi-mode retrieval and contour guidance
CN117893838A (en) Target detection method using diffusion detection model
CN115641445B (en) Remote sensing image shadow detection method integrating asymmetric inner convolution and Transformer
He et al. An optimal 3D convolutional neural network based lipreading method
CN113657415B (en) Object detection method oriented to schematic diagram
CN111179361B (en) Automatic black-and-white image coloring method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination