CN117274732B

CN117274732B - Method and system for constructing optimized diffusion model based on scene memory drive

Info

Publication number: CN117274732B
Application number: CN202311208862.9A
Authority: CN
Inventors: 张磊; 张志宇; 甄先通; 左利云; 李欣; 王宝艳
Original assignee: Guangdong University of Petrochemical Technology
Current assignee: Guangdong University of Petrochemical Technology
Priority date: 2023-09-18
Filing date: 2023-09-18
Publication date: 2024-07-16
Anticipated expiration: 2043-09-18
Also published as: CN117274732A

Abstract

The invention relates to the field of image processing, in particular to a method and a system for constructing an optimized diffusion model based on a scene memory driver, which comprises the following steps: constructing a plurality of tasks, wherein each task comprises a plurality of image samples and manual labels corresponding to the image samples; constructing the image samples and the manual labels corresponding to the image samples as a support set and a query set; constructing a scene memory storage, and optimizing the scene memory storage by using a support set and a query set of a plurality of tasks; constructing a diffusion model based on scene memory driving; optimizing a diffusion model based on a scene memory driver by using a support set and a query set of a plurality of tasks; and constructing a new task, wherein the new task comprises a plurality of images to be predicted, and predicting the images to be predicted by using the optimized diffusion model based on the scene memory driving. According to the invention, the diffusion model based on the scene memory driving is used for solving the noise problem of parameters in the prediction process, and the accuracy of the prediction result is improved.

Description

Method and system for constructing optimized diffusion model based on scene memory drive

Technical Field

The invention relates to the field of image processing, in particular to a method and a system for constructing an optimized diffusion model based on a scene memory driver.

Background

Meta learning is a solution measure for classifying small samples, and is to learn the passing characteristics of a classifier by using multiple tasks, and then achieve the effect of improving the classification performance under the small samples by using support set data of specific tasks. In particular, it is assumed that there is a task distributionFrom which the tasks can be sampled a number of times, for each task sampledIt includes a support set And test setAlthough for each task, the training data only has support set data, namely data containing C categories, and each category data contains K samples, under the condition that the training data only has K samples due to the overfitting phenomenon, the classifier is difficult to train well by using the traditional method, so that the classifier obtains better performance on a test set. The basic idea of meta learning is to use the number of tasks to compensate for the shortcoming of few samples. Similar to human learning habits, such as learning to ride a bicycle, it should be easy to learn to ride a motorcycle, some general knowledge between tasks is learned, and when a new task is encountered, it can be quickly learned to adapt to the new task without requiring many samples.

In conventional parameter predictions, model parameters learned under such data are noisy due to noise in the labeling of the data, i.e., interference from nonstandard or inaccurate labeling. But the current approach has few or no consideration of this noise. The denoising capability of the diffusion model can be used for helping to recover clean parameters and accurately predict. At present, meta-learning based on optimization is gradually becoming an effective solution to cope with the dilemma of learning with few samples. However, in the case of limited samples, the gradient estimation becomes noisy, often resulting in the metamodel either diverging or sinking into the suboptimal minimum of each small sample task.

Disclosure of Invention

The invention aims to overcome at least one defect (deficiency) of the prior art, and provides a method and a system for constructing an optimized diffusion model based on a scene memory drive, which solve the noise problem of diffusion model parameters by using a scene memory storage and improve the accuracy of a prediction result.

The technical scheme adopted by the invention is as follows:

in a first aspect, a method for constructing an optimized diffusion model based on a context memory driver is provided, including:

constructing a plurality of tasks, wherein each task comprises a plurality of image samples and manual labels corresponding to the image samples;

constructing a part of the image samples and the manual labels corresponding to the image samples as a support set, and constructing the other part as a query set;

Constructing a scene memory storage, and optimizing the scene memory storage by using a support set and a query set of a plurality of tasks;

Constructing a diffusion model based on scene memory driving according to the scene memory;

optimizing a diffusion model based on a scene memory driver by using a support set and a query set of a plurality of tasks;

and constructing a new task, wherein the new task comprises a plurality of images to be predicted, and predicting the images to be predicted by using the optimized diffusion model based on the scene memory driving.

Under the condition of limited samples based on optimized meta-learning, gradient estimation often becomes noisy, often resulting in meta-models that either diverge or sink into suboptimal minima for each small sample task. In order to alleviate the problem, the invention constructs a diffusion model based on scene memory driving for meta learning. By incrementally updating the basic model parameters, trapping to task-specific local minimum traps is avoided. Firstly, constructing a scene memory storage from a neural network check point, and utilizing the scene memory storage as a prompt to guide a new task of updating a basic model; secondly, taking the conditional diffusion model as an optimizer, taking the parameter to be updated as the conditional diffusion model, taking the scene memory as a guide, and predicting the required optimal parameter update suitable for the expected basic model by combining the diffusion time steps, so that the invention can optimize the basic model by using unfamiliar parameters only through single or few times of update in meta-test, effectively solve the noise problem of the diffusion model parameters, and improve the accuracy of the prediction result. Meanwhile, the method has universality and flexibility, and can be smoothly integrated with the model of the existing meta learning method based on optimization.

Further, the constructing the context memory storage, optimizing the context memory storage by using a support set and a query set of a plurality of tasks, specifically includes:

Constructing a scene memory for storing task keywords and value fields;

internal circulation optimization is carried out on initial parameters of the scene memory storage by using a support set of a plurality of tasks, and parameters after the support set optimization are obtained;

After the internal circulation updating is finished, the parameters after the scene memory support set optimization are subjected to external circulation updating by using the query sets of a plurality of tasks, so that the parameters after the query set optimization are obtained;

and optimizing the scene memory according to the parameters after the support set optimization and the parameters after the query set optimization.

According to the invention, the support set and the query set are used for carrying out internal circulation and external circulation, and parameters are repeatedly optimized, so that the scene memory storage is optimized, the parameters in the process of constructing the diffusion model are ensured to be the current optimal, and the noise of the model parameters is reduced.

Further, the constructing a scenario memory storage for storing task keywords and value fields specifically includes:

building context memory

N represents N storage units in the scene memory, N is smaller than or equal to the storage capacity of the scene memory, and each storage unit M _i stores a key value pair, namely a keyword and a value field, and is represented by M _i＝[K_i,V_i;

Wherein V _i represents a value range of the ith task, and K _i represents a keyword of the ith task, specifically expressed as: k _i =

Transformer([cls,e₁,…,e_n])[0]；

Where n represents the total number of image samples in the support set and query set for the ith task, e _j＝g_φ(x_j) the feature representation of image sample x _j after it passes encoder g _φ, x _j represents the jth image sample in the support set for the ith task, cls represents the class occupancy information for the ith task.

The invention stores the key words and the value fields of different tasks in the scene memory storage, wherein the value fields are key check points in different tasks, and in the process of optimizing the scene memory storage, as the task number is increased and exceeds the maximum capacity of the scene memory storage, the memory unit which firstly enters the scene memory storage is replaced by adopting the first-in first-out technology, and M _i＝[K_i,V_i obtained in a new task is added into the scene memory storage.

Further, the internal loop optimization is performed on the initial parameters of the context memory by using the support sets of the plurality of tasks to obtain parameters after the support sets are optimized, which specifically comprises:

with the ith task The support set of the image samples x _j and the artificial annotations y _j corresponding to the image samples optimize the original parameters phi, which are specifically expressed as follows:

Wherein, alpha represents learning rate of inner loop optimization, S represents the number of image samples in the support set, f _φ(x_j) represents prediction labeling of the image samples x _i by using original parameters phi, which are obtained by an encoder and a predictor, phi _i represents parameters after the support set optimization;

Use tasks Is used for supporting iterative updating of parameters phi _i of all image samples in a centralized manner;

Using task sets Iteratively updating the parameter phi _i corresponding to each task,Representing the number of tasks in the task set;

and obtaining the optimized parameter phi _i of the support set corresponding to each task.

Further, the external circulation updating is performed on the parameters optimized by the context memory support set by using the query set of the plurality of tasks to obtain the parameters optimized by the query set, which specifically comprises:

optimizing the parameter phi _i after support set optimization by using the image sample x in the query set of the ith task and the artificial annotation y corresponding to the image sample, wherein the method is specifically expressed as follows:

Wherein, gamma represents the learning rate of the outer loop optimization, Q represents the number of image samples in the query set, Representing the predictive annotation of the image sample x using the support set optimized parameter phi _i,Representing the number of tasks in the task set;

Using task sets Iteratively updating the parameter phi _i ^′ corresponding to each task according to the query set of each task;

obtaining the optimized parameter phi _i ^′ of the query set corresponding to each task.

In the meta-learning framework, sampling tasks according to a task distribution, a series of task sequences is generated, The core idea of meta-learning is to find a generic meta-learner in the training task of the meta-training phase, thus for the taskThe parameter phi can be optimized by alternating the internal and external cycles to achieve the optimum. Optimizing parameters by using the support set sample data in the internal circulation, and obtaining parameters phi _i after the support set optimization; and optimizing parameters by using sample data in the query set during external circulation, and obtaining the parameters phi _i ^′ after the optimization of the query set.

More specifically, the optimizing the context memory according to the parameters after the support set optimization and the parameters after the query set optimization specifically includes:

parameters phi _i ^′ after optimizing the ith task query set, parameters phi _i after optimizing the support set, average loss The accuracy U is obtained according to the prediction annotation of the parameter phi _i ^′ optimized by using the query set and the prediction annotation of the original parameter phi, and is used as the value of the value range V _i to be stored in the scene memory storageRepresenting the predictive annotation of the image sample x using the query set optimized parameter phi _i ⁱ.

And constructing parameters after optimizing the query set, parameters after optimizing the support set, average loss obtained by calculating the query set and accuracy obtained according to the prediction annotation after optimizing the query set and the original prediction annotation as a value field of an ith task, and using the value field as the content stored in the scene memory storage for further optimizing a diffusion model constructed later.

Further, the constructing a diffusion model based on the scene memory driving according to the scene memory storage specifically comprises the following steps:

Parameter phi' after optimizing query set, initial parameter phi for random initialization and prompt Initial informationThe time stamp t is used as an input of a diffusion model based on the scene memory driving;

The prompt is Comprises parameters phi _r, average loss and accuracy U after optimizing a support set stored in a scene memory storage, wherein the initial informationThe average loss and the accuracy obtained by the current task are used as initial loss and initial accuracy, and the time stamp t is the iteration number of loop optimization in the scene memory;

The output of the diffusion model based on the context memory driving is expressed as:

Wherein Θ _θ represents a diffusion model, θ represents diffusion model parameters, and is obtained through a transducer; e represents noise as the diffusion process proceeds, which is a random, from Gaussian distribution Sampling to obtain; The learning rate of the inner loop optimization is represented.

When noise is present in the signal, the diffusion model may be configured to predict the signal or noise. Conventional diffusion models are often used to predict noise, and the present invention utilizes diffusion model prediction signals, i.e., the diffusion model is parameterized to output task-specific parameters. The invention obtains K _i according to the current task and the scene memory storageThe K of all units in the model is used for calculating cosine distance, phi _r ^′ parameters stored in V _r corresponding to the memory unit with the smallest distance are selected as a first input phi ', the input is generally noisy, and the model provided by the invention can be combined with other information to remove noise from phi' so as to perform more accurate prediction.

Further, the optimizing the diffusion model based on the context memory driving by using the support set and the query set of a plurality of tasks specifically comprises the following steps:

Optimizing an objective function:

constructing a total loss function:

Where E represents noise as the diffusion process proceeds, is a random, from Gaussian distribution Sampling to obtain, theta represents diffusion model parameters, obtaining through a transducer,The learning rate representing the inner loop optimization, x ^q represents the image sample,Representing the prediction labels corresponding to the image samples x ^q, Q representing the number of image samples in the query set,Representing the calculated expectation, lambda represents the adjustment parameter,Representing predictions given x ^q and parameter phiProbability of (2);

The parameters theta and phi of the diffusion model based on the scene memory driving are optimized by using the total loss function.

The diffusion model represented by the objective function theta _θ is realized by using a transducer with the parameter theta, and then the total loss function is utilized to optimize the diffusion model parameter theta and the base line network parameter phi of the scene memory drive. Wherein the method comprises the steps ofIndicating the computational expectations, λ is the adjustment parameter, the specific value of which can be adjusted in the experiment.

Further, the construction of a new task, wherein the new task comprises a plurality of images to be predicted, and the prediction of the images to be predicted is performed by using an optimized diffusion model based on a scene memory drive, and specifically comprises the following steps:

constructing a new task The new taskThe method comprises the steps that the method comprises a plurality of images to be predicted, wherein the images to be predicted are divided into a support set and a query set;

Acquiring new tasks A corresponding keyword K;

Calculating key K and scene memory Cosine distances of key words K _j of all storage units in the memory;

Selecting a value range V _r corresponding to a keyword K _r with the minimum cosine distance, acquiring a parameter phi ^′ _r after query set optimization, a parameter phi _r after support set optimization, average loss and accuracy U stored in the value range V _r, and constructing the parameter phi _r after support set optimization, the average loss and the accuracy U into a prompt

Utilizing new tasksIs used for calculating initial loss and initial accuracy rate and generating initial information

Random generation and new tasksCorresponding initial parameters phi;

Parameters phi ^′ _r and prompts after optimizing the acquired query set Initial informationNew taskCorresponding initial parameters phi and time stamp t are input into a diffusion model based on scene memory driving, and new tasks are generatedCorresponding prediction parameters phi ^′;

Utilizing new tasks Is completed for the new task by the predicted parameter phi ^′ Is a prediction of (2).

According to the invention, the image to be predicted is predicted through the scene memory storage and the diffusion model based on the scene memory drive, so that noise in the prediction process can be effectively reduced, and compared with the prior art, the accuracy of a prediction result is higher.

In a second aspect, a system for constructing an optimized diffusion model based on context memory drivers is provided, comprising:

The task construction module is used for constructing a plurality of tasks, and each task comprises a plurality of image samples and manual labels corresponding to the image samples; one part of the image samples and the manual labels corresponding to the image samples are constructed as a support set, and the other part is constructed as a query set;

The scene memory storage construction module is used for constructing a scene memory storage and optimizing the scene memory storage by using a support set and a query set of a plurality of tasks;

the diffusion model construction module is used for constructing a diffusion model based on the scene memory drive according to the scene memory;

the diffusion model optimization module is used for optimizing a diffusion model based on the scene memory driving by using a support set and a query set of a plurality of tasks;

And the prediction module is used for constructing a new task, wherein the new task comprises a plurality of images to be predicted, and the optimized diffusion model based on the scene memory drive is used for predicting the images to be predicted.

Compared with the prior art, the invention has the beneficial effects that:

(1) According to the invention, the support set and the query set are used for carrying out internal circulation and external circulation, and parameters are repeatedly optimized, so that the scene memory storage is optimized, the parameters in the process of constructing the diffusion model are ensured to be the current optimal, and the noise of the model parameters is reduced;

(2) According to the invention, the basic model can be optimized by using unfamiliar parameters only through single or few times of updating in meta-test, so that the noise problem of diffusion model parameters is effectively solved, and the accuracy of a prediction result is improved;

(3) The invention has universality and flexibility, and can be smoothly integrated with the model of the existing meta learning method based on optimization.

Drawings

Fig. 1 is a flow chart of the method of embodiment 1 of the present invention.

Fig. 2 is a schematic diagram of a scene memory according to embodiment 1 of the present invention.

Fig. 3 is a technical frame diagram of a diffusion model optimization process based on a scene memory driving in embodiment 1 of the present invention.

Fig. 4 is a system configuration diagram of embodiment 2 of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the invention. For better illustration of the following embodiments, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the actual product dimensions; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

Example 1

As shown in fig. 1, the present embodiment provides a method for constructing an optimized diffusion model based on a context memory driver, including:

s1, constructing a plurality of tasks, wherein each task comprises a plurality of image samples and manual labels corresponding to the image samples;

S2, constructing a part of the image samples and the manual labels corresponding to the image samples as a support set, and constructing the other part as a query set;

S3, constructing a scene memory storage, and optimizing the scene memory storage by using a support set and a query set of a plurality of tasks;

s4, constructing a diffusion model based on the scene memory drive according to the scene memory;

S5, optimizing a diffusion model based on the scene memory drive by using a support set and a query set of a plurality of tasks;

S6, constructing a new task, wherein the new task comprises a plurality of images to be predicted, and predicting the images to be predicted by using an optimized diffusion model based on the scene memory drive.

The embodiment builds a diffusion model based on scene memory driving, and avoids trapping in a task-specific local minimum trap by carrying out incremental updating on basic model parameters. Firstly, constructing a scene memory storage from a neural network check point, and utilizing the scene memory storage as a prompt to guide a new task of updating a basic model; secondly, taking the conditional diffusion model as an optimizer, taking the parameter to be updated as the conditional diffusion model, taking the scene memory as a guide, and predicting the required optimal parameter update suitable for the expected basic model by combining the diffusion time steps, so that the embodiment can optimize the basic model by using unfamiliar parameters only through single or few times of update in meta-test, effectively solve the noise problem of the diffusion model parameters, and improve the accuracy of a prediction result.

In a specific implementation process, step S1 of this embodiment specifically includes:

assume that there is a task distribution From which the task distribution can be sampled a number of times, for each task sampledThe method comprises a plurality of image samples x _i and artificial labels y _i corresponding to the image samples, i represents an ith image sample, and t represents a tth task;

The step S2 of this embodiment specifically includes:

Task is carried out Is constructed as a support set by a plurality of image samples x _i and artificial annotations y _i corresponding to the image samplesAnd test setWherein the support set contains data of C categories, each category data containing K samples, and the query set contains M samples in total.

The step S3 of this embodiment specifically includes:

s301, constructing a scene memory for storing task keywords and value fields;

s302, performing internal circulation optimization on initial parameters of a scene memory by using a support set of a plurality of tasks to obtain parameters after the support set optimization;

S303, after the internal circulation updating is finished, using a query set of a plurality of tasks to carry out external circulation updating on parameters after the scene memory support set is optimized, and obtaining parameters after the query set is optimized;

s304, optimizing the scene memory according to the parameters after the support set optimization and the parameters after the query set optimization.

In the embodiment, gradient optimization based on meta learning is used for optimizing a scene memory storage, as shown in fig. 2, a variable x ^S in the figure represents input in a support set in small sample learning, namely an image sample of the support set in the embodiment; y ^S represents the real output of the support set in the small sample learning, namely the artificial annotation corresponding to the image sample of the support set in the embodiment; Representing the prediction output of the corresponding x ^S, namely a prediction label obtained by predicting the image sample of the support set by using the current encoder and predictor; x ^Q represents the input in the query set in the small sample learning, i.e., the image sample of the query set in this embodiment; y ^Q represents the real output of the support set in the small sample learning, namely the manual annotation corresponding to the image sample of the query set in the embodiment; The prediction output representing the corresponding x ^Q, i.e., the prediction label that the image sample of the query set predicts using the current encoder and predictor.

The step S301 in this embodiment specifically includes:

building context memory

Transformer([cls,e₁,…,e_n])[0]；

As shown in FIG. 2, each cell of the context memory stores a key value pair, namely a key and a value field value, denoted by M _i＝[K_i,V_i. Wherein K _i represents the keyword representation of the ith task, which is an output representation corresponding to the category occupation information after the task is subjected to the conversion according to the characteristics of the task on the support set and the query set and the category occupation information cls; the value field V _i stores key checkpoints in different tasks, such as model parameters, loss, and accuracy.

The step S302 of this embodiment specifically includes:

Wherein, alpha represents learning rate of inner loop optimization, S represents the number of image samples in the support set, f _φ(x_j) represents prediction labeling of the image samples x _j by using original parameters phi, which are obtained by an encoder and a predictor, phi _i represents parameters after the support set optimization;

After the update of the inner loop is completed, the outer loop is started, that is, step S303 of this embodiment specifically includes:

The step S304 in this embodiment specifically includes:

Parameter phi' _i after optimizing the ith task query set, parameter phi _i after optimizing the support set, average loss The accuracy U is obtained according to the prediction annotation of the parameter phi' _i optimized by using the query set and the prediction annotation of the original parameter phi, and is used as the value of the value range V _i to be stored in the scene memory storageRepresenting the predictive annotation of the image sample x using the query set optimized parameter phi' _i.

In the meta-learning framework, sampling tasks according to a task distribution, a series of task sequences is generated, The core idea of meta-learning is to find a generic meta-learner in the training task of the meta-training phase, thus for the taskThe parameter phi can be optimized by alternating the internal and external cycles to achieve the optimum. Optimizing parameters by using the support set sample data in the internal circulation, and obtaining parameters phi _i after the support set optimization; and optimizing parameters by using sample data in the query set during external circulation, and obtaining parameters phi' _i after the optimization of the query set.

In the specific implementation process, when the scene memory is updated, as the number of tasks increases and exceeds the maximum capacity of the scene memory, the first-in first-out technology is adopted to replace the storage unit which is firstly entered into the scene memory, and M _i＝[K_i,V_i obtained in a new task is stored in the scene memory.

The step S4 of this embodiment specifically includes:

The present embodiment utilizes diffusion model prediction signals, i.e., the diffusion model is parameterized to output task-specific parameters. The overall technical framework is shown in fig. 3, and in the diffusion model based on the scene memory driving, the input of the diffusion model comprises the following parts:

(1) Parameter phi' after query set optimization: k _i obtained according to current task, and scene memory storage The K of all units in the memory unit is used for calculating cosine distance, phi ' _r parameters stored in V _r corresponding to the memory unit with the smallest distance are selected as a first input phi ', the input is generally noisy, and the embodiment can combine other information to remove noise from phi ' so as to perform more accurate prediction;

(2) Initial parameters phi: randomly initializing parameters;

(3) Prompt for I.e. the input part indicated by the dashed line in fig. 3, comprising parameters phi _r, average loss and accuracy U after optimization from the support set stored in the context memory;

(4) Initial information Calculating initial loss and initial accurate values according to the flow of the figure 2 by using the support set and the query set of the current task;

(5) Timestamp t: i.e. the time stamp in the diffusion model, the specific values can be adjusted in experiments.

From the above five inputs, the output of the diffusion model based on the context memory driving can be expressed as:

The step S5 in this embodiment specifically includes:

Optimizing an objective function:

constructing a total loss function:

In the embodiment, a diffusion model represented by an objective function theta _θ is realized by using a transducer with a parameter theta, and then a total loss function is utilized to optimize a diffusion model parameter theta and a base line network parameter phi of a scene memory drive; wherein the method comprises the steps ofIndicating the computational expectations, λ is the adjustment parameter, the specific value of which can be adjusted in experiments.

In step S6 of this embodiment, that is, the step of predicting the image to be predicted using the diffusion model based on the scene memory driving optimized in this embodiment specifically includes:

Acquiring new tasks A corresponding keyword K;

Random generation and new tasksCorresponding initial parameters phi;

Parameters phi' _r and prompts after optimizing the acquired query set Initial informationNew taskCorresponding initial parameters phi and time stamp t are input into a diffusion model based on scene memory driving, and new tasks are generatedCorresponding prediction parameters phi';

Utilizing new tasks Completion of the predicted parameter phi' for the new taskIs a prediction of (2).

According to the embodiment, the image to be predicted is predicted through the scene memory storage and the diffusion model based on the scene memory driving, so that noise in the prediction process can be effectively reduced, and compared with the prior art, the accuracy of a prediction result is higher.

Example 2

As shown in fig. 2, the present embodiment provides a system for constructing an optimized diffusion model based on a context memory driver, which is used for implementing the method for constructing an optimized diffusion model based on a context memory driver provided in embodiment 1, and the system specifically includes:

The task construction module 101 is configured to construct a plurality of tasks, where each task includes a plurality of image samples and manual labels corresponding to the image samples; one part of the image samples and the manual labels corresponding to the image samples are constructed as a support set, and the other part is constructed as a query set;

A context memory constructing module 102, configured to construct a context memory, and optimize the context memory using a support set and a query set of a plurality of tasks; the method specifically comprises the following steps:

Constructing a scene memory for storing task keywords and value fields;

building context memory N represents N storage units in the scene memory, N is smaller than or equal to the storage capacity of the scene memory, and each storage unit M _i stores a key value pair, namely a keyword and a value field, and is represented by M _i＝[K_i,V_i; wherein V _i represents a value range of the ith task, and K _i represents a keyword of the ith task, specifically expressed as: k _i＝Transformer([cls,e₁,…,e_n) 0; wherein n represents the total number of image samples in the support set and the query set of the ith task, e _j＝g_φ(x_j) the feature representation of the image sample x _j after passing through the encoder g _φ, x _j represents the jth image sample in the support set of the ith task, cls represents the category occupation information of the ith task;

with the ith task The support set of the image samples x _j and the artificial annotations y _j corresponding to the image samples optimize the original parameters phi, which are specifically expressed as follows: Wherein, alpha represents learning rate of inner loop optimization, S represents the number of image samples in the support set, f _φ(x_j) represents prediction labeling of the image samples x _j by using original parameters phi, which are obtained by an encoder and a predictor, phi _i represents parameters after the support set optimization; use tasks Is used for supporting iterative updating of parameters phi _i of all image samples in a centralized manner; using task setsIteratively updating the parameter phi _i corresponding to each task,Representing the number of tasks in the task set; obtaining parameters phi _i after optimizing the supporting set corresponding to each task;

optimizing the parameter phi _i after support set optimization by using the image sample x in the query set of the ith task and the artificial annotation y corresponding to the image sample, wherein the method is specifically expressed as follows: Wherein, gamma represents the learning rate of the outer loop optimization, Q represents the number of image samples in the query set, Representing the predictive annotation of the image sample x using the support set optimized parameter phi _i,Representing the number of tasks in the task set; using task setsIteratively updating the parameter phi _i ^′ corresponding to each task according to the query set of each task; obtaining the optimized parameter phi _i ^′ of the query set corresponding to each task.

Optimizing the scene memory according to the parameters after optimizing the support set and the parameters after optimizing the query set;

parameters phi _i ^′ after optimizing the ith task query set, parameters phi _i after optimizing the support set, average loss The accuracy U is obtained according to the prediction annotation of the parameter phi _i ^′ optimized by using the query set and the prediction annotation of the original parameter phi, and is used as the value of the value range V _i to be stored in the scene memory storageRepresenting the predictive annotation of the image sample x using the query set optimized parameter phi _i ^′.

A diffusion model construction module 103 for constructing a diffusion model based on a scene memory driver according to the scene memory; the method specifically comprises the following steps:

A diffusion model optimization module 104, configured to optimize a diffusion model based on a context memory driver using a support set and a query set of a plurality of tasks; the method specifically comprises the following steps:

Optimizing an objective function:

constructing a total loss function:

The prediction module 105 is used for constructing a new task, wherein the new task comprises a plurality of images to be predicted, and the optimized diffusion model based on the scene memory driving is used for predicting the images to be predicted; the method specifically comprises the following steps:

Acquiring new tasks A corresponding keyword K;

Selecting a value range V _r corresponding to a keyword K _r with the minimum cosine distance, acquiring a parameter phi' _r after query set optimization, a parameter phi _r after support set optimization, average loss and accuracy U stored in the value range V _r, and constructing the parameter phi _r after support set optimization, the average loss and the accuracy U as prompts

Random generation and new tasksCorresponding initial parameters phi;

Firstly, constructing a scene memory storage from a neural network check point, and utilizing the scene memory storage as a prompt to guide a new task of updating a basic model; secondly, taking the conditional diffusion model as an optimizer, taking the parameter to be updated as the conditional diffusion model, taking the scene memory as a guide, and predicting the required optimal parameter update suitable for the expected basic model by combining the diffusion time steps, so that the embodiment can optimize the basic model by using unfamiliar parameters only through single or few times of update in meta-test, effectively solve the noise problem of the diffusion model parameters, and improve the accuracy of a prediction result. Meanwhile, the embodiment has universality and flexibility, and can be smoothly integrated with a model of the existing meta learning method based on optimization.

It should be understood that the foregoing examples of the present invention are merely illustrative of the present invention and are not intended to limit the present invention to the specific embodiments thereof. Any modification, equivalent replacement, improvement, etc. that comes within the spirit and principle of the claims of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. The method for constructing the optimized diffusion model based on the scene memory driving is characterized by comprising the following steps of:

Constructing a new task, wherein the new task comprises a plurality of images to be predicted, and predicting the images to be predicted by using an optimized diffusion model based on scene memory driving;

the construction of the scene memory storage uses a support set and a query set of a plurality of tasks to optimize the scene memory storage, and specifically comprises the following steps:

Constructing a scene memory for storing task keywords and value fields;

the construction of the scene memory storage for storing task keywords and value fields specifically comprises the following steps:

building context memory

Wherein V _i represents a value range of the ith task, and K _i represents a keyword of the ith task, specifically expressed as: k _i＝Transformer([cls,e₁,…,e_n) 0;

Wherein n represents the total number of image samples in the support set and the query set of the ith task, e _j＝g_φ(x_j) the feature representation of the image sample x _j after passing through the encoder g _φ, x _j represents the jth image sample in the support set of the ith task, cls represents the category occupation information of the ith task;

The method comprises the steps of performing internal loop optimization on initial parameters of a scene memory by using a support set of a plurality of tasks to obtain parameters after the support set optimization, and specifically comprises the following steps:

Obtaining parameters phi _i after optimizing the supporting set corresponding to each task;

the method comprises the steps of performing outer loop update on parameters optimized by a scene memory support set by using a query set of a plurality of tasks to obtain parameters optimized by the query set, and specifically comprises the following steps:

Using task sets Iteratively updating the parameter phi' _i corresponding to each task according to the query set of each task;

Obtaining optimized parameters phi' _i of the query set corresponding to each task;

The optimizing the scene memory according to the parameters after the support set optimization and the parameters after the query set optimization specifically comprises the following steps:

Parameter phi' _i after optimizing the ith task query set, parameter phi _i after optimizing the support set, average loss The accuracy U is obtained according to the prediction annotation of the parameter phi' _i optimized by using the query set and the prediction annotation of the original parameter phi, and is used as the value of the value range V _i to be stored in the scene memory storageRepresenting the predictive annotation of the parameter phi' _i of the image sample x after optimization by using the query set;

the method for constructing the diffusion model based on the scene memory driving according to the scene memory storage specifically comprises the following steps:

Wherein Θ _θ represents a diffusion model, θ represents diffusion model parameters, and is obtained through a transducer; e represents noise as the diffusion process proceeds, which is a random, from Gaussian distribution Sampling to obtain; The learning rate of the internal circulation optimization is represented;

the method for optimizing the diffusion model based on the scene memory driving by using the support set and the query set of a plurality of tasks specifically comprises the following steps:

Optimizing an objective function:

constructing a total loss function:

optimizing parameters theta and phi of a diffusion model based on scene memory driving by using a total loss function;

The construction of a new task, wherein the new task comprises a plurality of images to be predicted, and the prediction of the images to be predicted is performed by using an optimized diffusion model based on a scene memory drive, and specifically comprises the following steps:

Acquiring new tasks A corresponding keyword K;

Random generation and new tasksCorresponding initial parameters phi;

2. A system for constructing an optimized diffusion model based on a context memory driver, comprising:

the prediction module is used for constructing a new task, wherein the new task comprises a plurality of images to be predicted, and the optimized diffusion model based on the scene memory drive is used for predicting the images to be predicted;

Constructing a scene memory for storing task keywords and value fields;

building context memory

Optimizing an objective function:

constructing a total loss function:

Acquiring new tasks A corresponding keyword K;

Random generation and new tasksCorresponding initial parameters phi;