CN117726870A

CN117726870A - Diffusion model-based small sample target detection model reinforcement learning method and device

Info

Publication number: CN117726870A
Application number: CN202311748564.9A
Authority: CN
Inventors: 翟宇鹏; 梅继林; 胡瑜
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-12-19
Filing date: 2023-12-19
Publication date: 2024-03-19

Abstract

The invention discloses a diffusion model-based small sample target detection model reinforcement learning method and a diffusion model-based small sample target detection model reinforcement learning device, which belong to the field of deep learning and computer vision and comprise the following steps: collecting basic class data and new class data to pretrain a small sample target detection model; constructing a prompt word for describing an object in the new class data, converting the prompt word into a text vector, inputting the text vector into a stable diffusion model, and guiding Gaussian noise to reduce noise to obtain a new class support set; carrying out pixel zero setting on an object in the original image of the basic class data, inputting the object and the text vector into a stable diffusion model to obtain a generated image related to a prompt word, and restoring the generated image into the original image to serve as a new class training set; and realizing the reinforcement learning of the small sample target detection model based on the new class support set and the new class training set. The data augmentation mode and the model fine adjustment strategy based on the stable diffusion model can effectively improve the detection performance of the small sample target detection model when the training data is insufficient, and have higher generalization capability.

Description

Diffusion model-based small sample target detection model reinforcement learning method and device

Technical Field

The invention belongs to the field of deep learning and computer vision, and particularly relates to a small sample target detection model reinforcement learning method and device based on a diffusion model.

Background

Environmental awareness is an important technology in the field of automatic driving and is the basis of vehicle driving planning. In the running process of the vehicle, various sensors such as a camera, a millimeter wave radar, a laser radar and the like are required to acquire surrounding original environment information, and then information such as object positions, categories, areas and the like in the environment is acquired through a sensing algorithm, so that the vehicle can understand surrounding conditions and plan a target track to run. Image-based object detection is a particularly important fundamental task in the context awareness field. The image-based target detection algorithm can effectively detect obstacles around the vehicle, and ensure the running safety of the vehicle.

The conventional target detection algorithm can obtain a better target detection result under the condition of a large amount of training data. However, for unfamiliar scenarios, as well as some unusual obstacles, the effect of the object detection model tends to be poor due to the lack of effective training data, and accurate results cannot be obtained, which may bring a risk of erroneous judgment to the decision making system of the autonomous vehicle. To solve this problem, a small sample target detection algorithm has been developed that can still give a good detection result when there are only a few or even no valid training data.

For small sample target detection algorithms, researchers have proposed many methods to improve the performance of the model. These methods can be broadly classified into the following categories: 1. the meta learning-based method comprises the following steps: meta learning is a method for learning a model to a general rule from a small number of samples, and the method is characterized in that a basic class data set (visible class) and a new class data set (not visible class) are respectively arranged, and training is performed by using a large number of basic class data sets, so that prediction can be performed on the new class data set with a small number of data, and the model can be quickly adapted to a new target detection task from a small number of labeled samples. 2. Migration learning-based method: the method improves the performance of small sample target detection under a new data set by designing a proper network structure and a loss function to enable the network structure and the loss function to fully pretrain on the original data set and fine tune on the new data set. 3. The method based on data enhancement and information fusion comprises the following steps: the method improves the performance of small sample target detection through data enhancement and information fusion. For example, some methods utilize data enhancement to increase the number of training samples, or integrate information from multiple sources into one model through information fusion.

Patent document publication No. CN112257810a discloses a method for detecting a target of a marine organism based on improved FasterR-CNN, comprising: the data augmentation is realized through sample replication and random erasure, and the feature extraction capability of Faster R-CNN on images containing small submarine targets is further improved through a reinforced feature extraction network backbone, so that the recognition accuracy of submarine biological targets is improved under the condition that training samples are insufficient. However, the invention adopts the sample copying and other forms to solve the problem of insufficient training sample data, obviously the application range of the sample is insufficient, and the target detection task in the real scene is difficult to meet.

Patent document publication No. CN112949820a discloses a cognitive anti-interference target detection method based on generation of an countermeasure network, comprising: constructing and generating an countermeasure network; generating a target data set and an interference data set by using the original small sample set; respectively training the generated countermeasure network by using the target data set and the interference data set to obtain a new echo sample; constructing an anti-interference detection network; training the anti-interference detection network by using a new echo sample; and performing target detection and interference suppression by using the trained anti-interference detection network. However, the method adopts the original small sample set to generate the target data and the interference data set, obviously, the data with enough diversity cannot be generated, and therefore, the generalization capability of the trained model is difficult to ensure.

Disclosure of Invention

The invention aims to provide a diffusion model-based small sample target detection model reinforcement learning method and device, which are used for obtaining new class support set data based on a stable diffusion model, and realizing reinforcement learning of the small sample target detection model without additional acquisition data, so that the detection performance and generalization capability of the small sample target detection model when training data are insufficient can be effectively improved.

In order to achieve the above purpose, the technical scheme provided by the invention is as follows:

in a first aspect, the method for enhancing learning of a small sample target detection model provided by the embodiment of the invention comprises the following steps:

step 1: acquiring basic class data containing common objects and new class data containing unusual objects serving as target objects under a specific scene through a visual sensor;

step 2: respectively pre-training and fine-tuning the small sample target detection model by using the basic class data and the new class data to obtain a trained small sample target detection model;

step 3: based on a pre-constructed prompt word template, randomly generating new types of prompt words for a target object, encoding the new types of prompt words into new types of text vectors, inputting the new types of text vectors and the random generated Gaussian noise into a pre-trained stable diffusion model, and reducing noise to obtain a target object generated image, wherein the target object generated image is used as a new type support set;

step 4: labeling the basic class data to obtain a labeling frame, extracting the region where the labeling frame is positioned, performing binary masking to obtain a basic class mask map, inputting a new class text vector and the basic class mask map into a stable diffusion model to obtain a generated image, and restoring the generated image into the basic class data to serve as a new class training set;

step 5: and performing reinforcement learning on the trained small sample target detection model by using the new class support set and the new class training set, and improving the detection and recognition capability of the new class target object.

Aiming at the problem of insufficient generalization capability caused by a small sample target detection model in a strange scene or lack of training data, the invention obtains a new class support set and a new class training set for reinforcement learning of the small sample target detection model through stabilizing the diffusion model. Converting the pre-constructed prompt words describing the new types of target objects into text vectors through a text encoder, inputting the text vectors and the random Gaussian noise images into a stable diffusion model, and generating new types of target object generation images related to the content of the prompt words under the guidance of the text vectors to serve as a new type support set; and carrying out pixel zero setting on an object region in the base class data, inputting the object region and a text vector corresponding to the prompt word into a stable diffusion model, obtaining a generated image related to the prompt word, and restoring the generated image to an original image of the base class data to be used as a new class training set. The new class support set and the new class training set are used for carrying out reinforcement learning on the trained small sample target detection model, and the generalization capability of the small sample target detection model can be obviously improved on the premise of no additional data acquisition.

Further, the prompting word template is used for describing the prompting word of the single target object in the new class data, and the format of the prompting word template is as follows:

y＝a P _a P _n of O _class in O _scene with P _c color

wherein P is _a For adjectives, P _n As photo nouns, O _class For semantic category of object, O _scene As scene noun of object, P _c Is a color.

Further, the said coding is new text vector, using CLIP text coder epsilon _text The new class prompt word y is encoded into a high-dimensional new class text vector, which is expressed as:

ζ _y ＝ε _text (y)

wherein ζ _y Representing the new class of text vectors.

Further, the randomly generated Gaussian noise is of size R ^H×W×3 And H and W represent the height and width of the RGB image, respectively, the value of each pixel in the RGB image satisfying:

where xi represents the pure noise RGB image at time i, p (x _i ) A pixel value representing xi;

epsilon image encoder using VAE _img Compressing the pure noise RGB image into a potential space to obtain image characteristics corresponding to Gaussian noise, and expressing the image characteristics as follows by a formula:

z _i ＝ε _img (x _i )

wherein z is _i And representing the image characteristics corresponding to the pure noise RGB image at the moment i.

Further, the stable diffusion model is a noise reduction network based on a Unet network, the Unet network adopts a cross attention mechanism, noise vectors are obtained by fusing new text vectors, and the noise vectors are expressed as follows by a formula:

wherein N is _i The noise vectors representing the instant i, Q, K and V represent the query, key and value, respectively, d represents the dimension size of the vector,are all learnable parameters, < >>Representing the image feature z at time i _i Mapping after passing through the Unet network.

Further, when the stable diffusion model is adopted to reduce Gaussian noise, the noise vector at the current moment is subtracted from the image feature at the previous moment to obtain the image feature at the current moment, and the image feature is expressed as follows by a formula:

z _i ＝z _i+1 -α _i N _i

where zi denotes the image feature at time i, z _i+1 Image characteristics, alpha, representing time i+1 _i The fixed parameter at the moment of i;

restoring the fully denoised image feature z0 to a size R using a VAE image decoder D ^H×W×3 The target object generates an image, expressed as:

wherein,representing the target object to generate an image.

Further, the region where the extraction and labeling frame is located is subjected to binary masking, and objects in the original image of the basic class data are labeled, so that a labeling frame B is obtained _gt (x _b ,y _b ,w _b ,h _b )，x _b 、y _b 、w _b 、h _b And respectively representing the abscissa and the ordinate of the marking frame and the width and the height of the marking frame, and carrying out zero setting operation on the pixel values in the marking frame range.

In order to achieve the above object, the embodiment of the present invention further provides a small sample target detection model reinforcement learning device based on a diffusion model, which includes a data acquisition unit, a small sample target detection model pre-training unit, a new class support set construction unit, a new class training set construction unit, and a small sample target detection model reinforcement learning unit;

the data acquisition unit is used for acquiring basic class data containing common objects and new class data containing unusual objects serving as target objects under a specific scene through the visual sensor;

the small sample target detection model pre-training unit is used for respectively pre-training and fine-adjusting the small sample target detection model by using the basic class data and the new class data to obtain a trained small sample target detection model;

the new class support set construction unit is used for randomly generating new class prompt words for a target object based on a pre-constructed prompt word template, encoding the new class prompt words into new class text vectors, inputting the new class text vectors and the randomly generated Gaussian noise into a pre-trained stable diffusion model to reduce noise to obtain a target object generated image, and taking the target object generated image as a new class support set;

the new class training set construction unit is used for marking the basic class data to obtain a marking frame, extracting the region where the marking frame is positioned to perform binary masking to obtain a basic class masking map, inputting a new class text vector and the basic class masking map into the stable diffusion model to obtain a generated image, and restoring the generated image into the basic class data to serve as a new class training set;

the small sample target detection model reinforcement learning unit is used for reinforcement learning of the trained small sample target detection model by using the new class support set and the new class training set, and is used for improving the detection and recognition capability of the new class target object.

In order to achieve the above object, an embodiment of the present invention further provides a small sample object detection model reinforcement learning device based on a diffusion model, including a memory and a processor, where the memory is configured to store a computer program, and the processor is configured to implement, when executing the computer program, the small sample object detection model reinforcement learning method based on the diffusion model provided by the embodiment of the present invention in the first aspect.

In a fourth aspect, in order to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the storage medium, and when the computer program uses a computer, the method for enhancing learning of a small sample target detection model based on a diffusion model provided by the embodiment of the present invention in the first aspect is implemented.

The beneficial effects of the invention are as follows:

the invention provides a small sample target detection reinforcement learning method, which is a general method for effectively improving the detection performance of a small sample target detection model in various scenes. Meanwhile, the method can be applied to multiple fields, such as automatic driving, unmanned shops and the like, in the scenes, the categories of objects which are rarely appeared at ordinary times often appear, and the detection effect of the original small sample target detection model can be optimized by using the small sample target detection reinforcement learning method without manually acquiring additional data and labels, so that the method is convenient for the deployment of practical application.

Drawings

Fig. 1 is a flowchart of a small sample target detection model reinforcement learning method based on a diffusion model according to an embodiment of the present invention.

FIG. 2 is a new class support set image generated on a MSCOCO dataset provided by an embodiment of the present invention.

FIG. 3 is a new class training set image compiled on a MSCOCO data set provided by an embodiment of the present invention.

Fig. 4 is a schematic diagram of the effect of the diffusion model-based small sample target detection model for actual detection.

Fig. 5 is a comparison diagram of a small sample target detection model based on a diffusion model before and after reinforcement learning.

Fig. 6 is a schematic structural diagram of a small sample target detection model reinforcement learning device based on a diffusion model according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.

The invention is characterized in that: firstly, collecting basic class data and new class data which are respectively used for pre-training and fine-tuning a small sample target detection model to obtain a trained small sample target detection model; then, through a stable diffusion model, according to a text vector corresponding to a prompt word describing a new type target object, generating an image of the new type target object corresponding to the prompt word obtained by reducing Gaussian noise, and taking the image as a new type support set; and then carrying out pixel zero setting on an object in an original image in the basic class data, inputting the object and a prompt word into a stable diffusion model, obtaining a generated image guided by the prompt word, restoring the generated image into the original image of the basic class data as a new class training set, adopting a new class support set and the new class training set to realize reinforcement learning of the small sample target detection model, and improving the generalization capability of the small sample target detection model on the premise of not additionally collecting data.

Fig. 1 is a flowchart of a small sample target detection model reinforcement learning method based on a diffusion model according to an embodiment of the present invention. As shown in fig. 1, an embodiment provides a diffusion model-based small sample target detection model reinforcement learning method, which includes the following steps:

s110, acquiring basic class data containing common objects and new class data containing unusual objects serving as target objects in a specific scene through a visual sensor.

In this embodiment, the camera is used to collect and acquire 318 national trails of original RGB image data, and the original RGB image data is classified according to the basic class data classification standard and the new class data classification standard. The base class data class standard refers to objects that are common on a road, and the new class data class standard refers to objects that are unusual on a road and that are different from the base class data class. Obtaining base class data D after class division _base And new class data D _novel Wherein, the base class data D _base The related object category comprises 8 types of objects such as cars, pedestrians, traffic signs and the like which are more common on roads, and new type data D _novel The categories include: the truck, yak and other rare 5 kinds of objects on urban roads, and the objects in the new kind of data are used as target objects.

S120, respectively pre-training and fine-tuning the small sample target detection model by using the basic class data and the new class data to obtain a trained small sample target detection model.

In this embodiment, the small sample target detection model is a model based on the small sample target detection algorithm FSOD. The small sample target detection model adopts a two-stage training mode: 1. pre-training under base class data; 2. fine tuning under new classes of data: using the base class data D constructed in S110 _base Pretraining a small sample target detection model using new class data D _novel And fine tuning the pre-trained small sample target detection model, wherein the training process of the basic class data and the new class data is consistent.

S130, based on a pre-constructed prompt word template, randomly generating new types of prompt words for a target object, encoding the new types of prompt words into new types of text vectors, inputting the new types of text vectors and the randomly generated Gaussian noise into a pre-trained stable diffusion model, and reducing noise to obtain a target object generated image, wherein the target object generated image is used as a new type support set.

In the present embodiment, for new class data D _novel Constructing a prompting word template containing a single target object according to the target object category contained in the prompting word template: a P _a P _n of O _class in O _scene with P _c color. Wherein P is _a For adjectives, P _n As photo nouns, O _class For semantic category of object, O _scene As scene noun of object, P _c Is a color.

According to the prompt word template table, taking a class motorcycle rider (motorcyclist) as an example, a plurality of new classes of prompt words (prompt) are randomly generated:

A great photo of motorcyclist on mountain road with red color.
	A picture of motorcyclist in mountain.
A image of motorcyclist with white color.
	A photograph of motorcyclist,
A motorcyclist on the road.
	……

and then, inputting the new class prompt words into a pre-trained stable diffusion model, and freezing all weight parameters of the stable diffusion model to obtain a new class object generated image. The method comprises the following detailed steps:

using CLIP text encoder epsilon _text Encoding the new class hint words into a high-dimensional new class text vector: zeta type _y ＝ε _text (y). Then randomly generating a pure noise RGB image x epsilon R ^H×W×3 Wherein the value of each pixel satisfies a gaussian distribution:

epsilon image encoder using VAE _img Compressing the pure noise RGB image to a potential space (latency space) to obtain an image feature z corresponding to the pure noise RGB image _i Expressed by the formula:

z _i ＝ε _img (x _i )

inputting the new text vector and the image feature into a stable diffusion model, and gradually removing the image feature z at each moment through a Unet network _i Corresponding noise vector N _i Z obtained at last moment ₀ The method is characterized in that the image features after complete denoising are obtained, a new type of text vector is fused by using a cross attention mechanism, and the noise vector fused with the new type of text vector is expressed as follows by a formula:

wherein,representing the image feature z at time i _i Mapping after passing through the Unet network, +.> Are all learnable parameters (in the prediction process, are all fixed network parameters), and the image characteristics obtained by subtracting the noise vector from each time step are expressed as follows:

z _i ＝z _i+1 -α _i N _i

wherein alpha is _i Is a fixed parameter at the instant i. Repeating the steps to obtain denoised image characteristics z after a plurality of moments ₀ 。

Denoised image features z using a VAE image decoder D ₀ Restoring to original sizeThus, a new class target object generation image is obtained as a new class support set.

As shown in fig. 2, in this embodiment, the original dataset map in the MSCOCO dataset is used for generating an image, and the generated map is obtained by stabilizing the diffusion model, so that it can be obviously seen that the target object in the generated map is the same as the target object in the original dataset map, and the accuracy of the scheme of the present invention is verified.

And S140, marking the basic class data to obtain a marking frame, extracting the region where the marking frame is positioned, performing binary masking to obtain a basic class mask map, inputting a new class text vector and the basic class mask map into a stable diffusion model to obtain a generated image, and restoring the generated image into the basic class data to serve as a new class training set.

As shown in fig. 3 and 4, the base class data D _base Selecting an object area, marking out an area where an object is located, and carrying out pixel zeroing, namely binary masking on the area where a marking frame is located to obtain a base class mask image, wherein through given prompt words, a generated image corresponding to different prompt words is shown in fig. 3. As shown in fig. 4, the base class mask map and the new class prompt words in S130 are input into the stable diffusion model together, the feature information in the new class text vector corresponding to the new class prompt words is extracted by adopting a cross attention mechanism, the base class mask map is guided to generate a generated image under the new class prompt words, and finally the generated image is replaced with the region where the label frame in the original image is located, and region restoration is performed to obtain the edited new image. The edited new graph is used as a new training set for reinforcement learning of the trained small sample target detection network.

And S150, performing reinforcement learning on the trained small sample target detection model by using the new class support set and the new class training set, and improving the detection and recognition capability of the new class target object.

As shown in fig. 5, the new class training set is input into the trained small sample target detection model for enhancement training, and the new class support set is used as a truth value label. The detection and recognition capability of the small sample target detection model to the target object in the new class data can be remarkably enhanced through the reinforcement learning of the new class support set and the new class training set to the trained small sample target detection model. In the enhanced training process of the small sample target detection model, no additional data are acquired for model training, and the training set and the support set data are both from the output of the stable diffusion model, so that the method provided by the invention can be used for greatly solving the training and target detection of the small sample target detection model under the condition of insufficient training samples, and improving the model generalization capability of the small sample target detection model.

Based on the same inventive concept, the embodiment of the invention also provides a small sample target detection model reinforcement learning device 600 based on a diffusion model, as shown in fig. 6, which comprises a data acquisition unit 610, a small sample target detection model pre-training unit 620, a new class support set construction unit 630, a new class training set construction unit 640, and a small sample target detection model reinforcement learning unit 650;

the data acquisition unit 610 is configured to acquire, by using a vision sensor, base class data including common objects and new class data including unusual objects, where the unusual objects are target objects;

the small sample target detection model pre-training unit 620 is configured to pre-train and fine-tune the small sample target detection model by using the base class data and the new class data, respectively, to obtain a trained small sample target detection model;

the new class support set construction unit 630 is configured to randomly generate a new class of prompt word for a target object based on a pre-constructed prompt word template, encode the new class of prompt word into a new class of text vector, input the new class of text vector and the randomly generated gaussian noise into a pre-trained stable diffusion model, and reduce noise to obtain a target object generated image, where the target object generated image is used as a new class support set;

the new class training set construction unit 640 is used for labeling the base class data to obtain a labeling frame, extracting the region where the labeling frame is located, performing binary masking to obtain a base class mask map, inputting a new class text vector and the base class mask map into a stable diffusion model to obtain a generated image, and restoring the generated image into the base class data to serve as a new class training set;

the small sample target detection model reinforcement learning unit 650 is configured to reinforcement learn the trained small sample target detection model using the new class support set and the new class training set, and to enhance detection recognition capability on the new class target object.

Based on the same inventive concept, the embodiment also provides a small sample target detection model reinforcement learning device based on a diffusion model, which comprises a memory and a processor, wherein the memory is used for storing a computer program, and the processor is used for realizing the small sample target detection model reinforcement learning method based on the diffusion model when executing the computer program.

Based on the same inventive concept, the embodiment also provides a computer readable storage medium, wherein the storage medium is stored with a computer program, and when the computer program uses a computer, the small sample target detection model reinforcement learning method based on the diffusion model is realized.

It should be noted that, the diffusion model-based small sample target detection model reinforcement learning device and the computer-readable storage medium provided in the foregoing embodiments belong to the same concept as the diffusion model-based small sample target detection model reinforcement learning method embodiment, and specific implementation processes thereof are detailed in the diffusion model-based small sample target detection model reinforcement learning method embodiment and are not described herein.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Although the foregoing detailed description of the invention has been provided, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing examples, and that certain features may be substituted for those illustrated and described herein. Modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The small sample target detection model reinforcement learning method based on the diffusion model is characterized by comprising the following steps of:

2. The diffusion model-based small sample object detection model reinforcement learning method of claim 1, wherein the prompt word template is used for describing the prompt word of a single object in new class data, and the format of the prompt word template is as follows:

y＝a P _a P _n of O _class in O _scene with P _c color

3. The diffusion model-based small sample object detection model reinforcement learning method of claim 2, wherein said encoding into new classes of text vectors uses CLIP text encoder epsilon _text The new class prompt word y is encoded into a high-dimensional new class text vector, which is expressed as:

ζ _y ＝ε _text (y)

wherein ζ _y Representing the new class of text vectors.

4. The diffusion model-based small sample object detection model reinforcement learning method of claim 3, wherein the randomly generated gaussian noise is of size R ^H×W×3 And H and W represent the height and width of the RGB image, respectively, the value of each pixel in the RGB image satisfying:

z _i ＝ε _img (x _i )

5. The diffusion model-based small sample object detection model reinforcement learning method according to claim 4, wherein the stable diffusion model is a noise reduction network based on a Unet network, the Unet network adopts a cross attention mechanism, and noise vectors are obtained by fusing new text vectors, and the noise vectors are expressed as:

wherein N is _i The noise vectors representing the i time, Q, K and V represent the query, key and value, respectively, d represents the dimension of the vector,are all learnable parameters, < >>Representing the image feature z at time i _i Mapping after passing through the Unet network.

6. The diffusion model-based small sample object detection model reinforcement learning method according to claim 5, wherein when the stable diffusion model is adopted to reduce Gaussian noise, the noise vector at the current moment is subtracted from the image feature at the previous moment to obtain the image feature at the current moment, and the image feature at the current moment is expressed as follows:

z _i ＝z _i+1 -α _i N _i

wherein,representing the target object to generate an image.

7. The diffusion model-based small sample target detection model reinforcement learning method according to claim 1, wherein the region where the extraction annotation frame is located is subjected to binary masking, and objects in the original image of the base class data are annotated, so that an obtained annotation frame B is obtained _gt (x _b ,y _b ,w _b ,h _b )，x _b 、y _b 、w _b 、h _b And respectively representing the abscissa and the ordinate of the marking frame and the width and the height of the marking frame, and carrying out zero setting operation on the pixel values in the marking frame range.

8. The small sample target detection model reinforcement learning device based on the diffusion model is characterized by comprising a data acquisition unit, a small sample target detection model pre-training unit, a new class support set construction unit, a new class training set construction unit and a small sample target detection model reinforcement learning unit;

9. A diffusion model based small sample object detection model reinforcement learning device comprising a memory for storing a computer program and a processor, characterized in that the processor is adapted to implement the diffusion model based small sample object detection model reinforcement learning method according to any of claims 1-7 when executing the computer program.

10. A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when used with a computer, implements the diffusion model-based small sample object detection model reinforcement learning method of any one of claims 1 to 7.