CN115641485A

CN115641485A - Generative model training method and device

Info

Publication number: CN115641485A
Application number: CN202211363105.4A
Authority: CN
Inventors: 张晗; 冯睿蠡; 阳展韬; 黄梁华; 刘宇; 张轶飞; 沈宇军; 赵德丽; 周靖人
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-11-02
Filing date: 2022-11-02
Publication date: 2023-01-24

Abstract

The embodiment of the specification provides a generative model training method and a device, wherein the generative model training method comprises the following steps: acquiring an original image; performing diffusion processing on the original image, and performing attenuation processing on the image component of the original image to obtain a noise-added image set; determining a noise-added image according to the original image and the noise-added image set, and inputting the noise-added image into an initial generation model for processing to obtain a restored image; and performing parameter adjustment on the initial generation model based on the reduction image and the original image until a target generation model meeting a training stop condition is obtained. In the diffusion and inverse diffusion processing processes, attenuation processing of image components is combined, so that the generated model can learn the image change processes of different dimensions, and the model training precision is effectively improved.

Description

Generative model training method and device

Technical Field

The embodiment of the specification relates to the technical field of machine learning, in particular to a generative model training method and device.

Background

With the development of internet technology, diffusion processes have been widely used to build generative models. The generative model is enabled to generate samples in a given data distribution from gaussian noise by having the generative model learn the inverse of the diffusion process. The diffusion process can be understood as an iterative process, and in each step of the iterative process, gaussian noise is further added to the current noise-added data, so that the dimension of the noise-added data does not change before and after each noise addition due to the step-by-step noise-adding iterative step, and is always consistent with the dimension of the original data. Similarly, the inverse process corresponding to the diffusion process is also an iterative process, and the dimension does not change in the process. That is to say, in the current diffusion process and the inverse process thereof, the dimension of the noisy data is always the same as the original data dimension, when the data dimension is high, the problems of high model training cost and high model learning difficulty may exist, and meanwhile, due to the characteristic of the data dimension being maintained, the model generated based on the diffusion process and the inverse processing process has long training time on the high-dimensional data and high fitting difficulty, so an effective scheme is urgently needed to solve the problems.

Disclosure of Invention

In view of this, the embodiments of the present specification provide a generative model training method. One or more embodiments of the present specification also relate to a generative model training apparatus, an image processing method, an image processing apparatus, another image processing method, another image processing apparatus, another generative model training method, another generative model training apparatus, a computing device, a computer-readable storage medium, and a computer program, so as to solve the technical defects in the prior art.

According to a first aspect of embodiments of the present specification, a generative model training method is provided, which may learn an input/output relationship based on a preset algorithm parameter independent assumption after a training data set is given, and then output a feature satisfying a use requirement for a given input feature after processing for a model learning the relationship based on the learning relationship. The method comprises the following steps:

acquiring an original image;

performing diffusion processing on the original image, and performing attenuation processing on an image component of the original image to obtain a noise image set;

determining a noise-added image according to the original image and the noise-added image set, and inputting the noise-added image into an initial generation model for processing to obtain a restored image;

and performing parameter adjustment on the initial generation model based on the reduction image and the original image until a target generation model meeting a training stop condition is obtained.

According to a second aspect of embodiments herein, there is provided a generative model training apparatus comprising:

an acquisition module configured to acquire an original image;

the processing module is configured to perform diffusion processing on the original image and perform attenuation processing on image components of the original image to obtain a noise-added image set;

the determining module is configured to determine a noise-added image according to the original image and the noise-added image set, and input the noise-added image into an initial generation model for processing to obtain a restored image;

a training module configured to perform parameter adjustment on the initial generation model based on the restored image and the original image until a target generation model satisfying a training stop condition is obtained.

According to a third aspect of embodiments of the present specification, there is provided an image processing method including:

acquiring an image to be processed uploaded by a user terminal;

inputting the image to be processed into a target generation model in the method for processing to obtain a target image;

determining a target object based on the target image, and loading object information corresponding to the target object;

and sending the object information to the user terminal.

According to a fourth aspect of embodiments of the present specification, there is provided an image processing apparatus comprising:

the image acquisition module is configured to acquire an image to be processed uploaded by a user terminal;

the model processing module is configured to input the image to be processed into a target generation model in the method for processing to obtain a target image;

the object determining module is configured to determine a target object based on the target image and load object information corresponding to the target object;

a sending information module configured to send the object information to the user terminal.

According to a fifth aspect of embodiments herein, there is provided another image processing method including:

acquiring an initial shopping search image uploaded by a user terminal;

inputting the initial shopping search image into a target generation model in the method for processing to obtain a target shopping search image corresponding to the initial shopping search image;

determining a related commodity based on the target shopping search image, and loading commodity information corresponding to the related commodity;

and sending the commodity information to the user terminal, wherein the user terminal generates a commodity recommendation interface based on the commodity information and displays the commodity recommendation interface.

According to a sixth aspect of embodiments herein, there is provided another image processing apparatus including:

the image acquisition module is configured to acquire an initial shopping search image uploaded by a user terminal;

the input model module is configured to input the initial shopping search image into the target generation model in the method for processing, and a target shopping search image corresponding to the initial shopping search image is obtained;

the loading information module is configured to determine a related commodity based on the target shopping search image and load commodity information corresponding to the related commodity;

and the information sending module is configured to send the commodity information to the user terminal, wherein the user terminal generates a commodity recommendation interface based on the commodity information and displays the commodity recommendation interface.

According to a seventh aspect of the embodiments of the present specification, there is provided another generative model training method applied to a server, where the generative model is a machine learning model, including:

receiving an original image uploaded by a model demand end;

performing diffusion processing on the original image, and performing attenuation processing on the image component of the original image to obtain a noise-added image set;

adjusting parameters of the initial generation model based on the reduction image and the original image until a target generation model meeting a training stop condition is obtained;

determining model parameters corresponding to the target generation model, and feeding back the model parameters to the model demand side

According to an eighth aspect of the embodiments of the present specification, there is provided another generative model training device applied to a server, where the generative model is a machine learning model, including:

the image receiving module is configured to receive an original image uploaded by a model demand end;

the image processing module is configured to perform diffusion processing on the original image and perform attenuation processing on image components of the original image to obtain a noise-added image set;

the image determining module is configured to determine a noise-added image according to the original image and the noise-added image set, and input the noise-added image into an initial generation model for processing to obtain a restored image;

a model training module configured to perform parameter adjustment on the initial generation model based on the restored image and the original image until a target generation model satisfying a training stop condition is obtained;

and the parameter sending module is configured to determine model parameters corresponding to the target generation model and feed the model parameters back to the model demand side.

According to a ninth aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is used for storing computer executable instructions, and the processor is used for realizing the steps of the method when executing the computer executable instructions.

According to a tenth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the above-described method.

According to an eleventh aspect of embodiments herein, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the above method.

In order to complete model training quickly and efficiently and reduce the difficulty in fitting a model, a generative model training method provided in an embodiment of the present specification may include performing diffusion processing on an original image after the original image is acquired, simultaneously performing attenuation processing on an image component of the original image in the diffusion processing process to obtain a noise-added image set subjected to diffusion and dimensionality reduction, screening a noise-added image that can be used for model training according to the original image and the noise-added image set, inputting the noise-added image to an initial generative model for processing on the basis of the noise-added image set, acquiring a restored image, and finally performing parameter adjustment on the initial generative model based on the original image and the restored image until a target generative model satisfying a training stop condition is obtained. The method realizes the purpose of training the generated model through the diffusion process with variable dimensionality and reducing the model fitting difficulty while improving the training speed, thereby realizing the purpose of quickly and efficiently completing the training of the high-precision generated model.

Drawings

FIG. 1 is a schematic diagram of a generative model training method provided in one embodiment of the present specification;

FIG. 2 is a flow diagram of a generative model training method provided in one embodiment of the present specification;

FIG. 3 is a schematic structural diagram of a generative model training apparatus provided in an embodiment of the present disclosure;

FIG. 4 is a flowchart of an image processing method provided in one embodiment of the present specification;

fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present specification;

FIG. 6 is a flow chart of another image processing method provided by one embodiment of the present description;

fig. 7 is a schematic structural diagram of another image processing apparatus provided in an embodiment of the present specification;

FIG. 8 is a flow diagram of another generative model training method provided in one embodiment of the present specification;

FIG. 9 is a schematic structural diagram of another generative model training apparatus provided in an embodiment of the present specification;

FIG. 10 is a process flow diagram of a generative model training method provided by an embodiment of the present specification;

fig. 11 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present specification. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if," as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination," depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

Generating a model: refers to a model that can be sampled from a given data distribution.

And (3) diffusion process: the random process of gradually adding noise to the data has an inverse process to restore the data distribution. Can be used to build generative models.

A machine learning model: the method can learn the input/output relationship based on the preset algorithm parameter independent assumption after a training data set is given, and then output the characteristics meeting the use requirements after processing the given input characteristics based on the model for learning the relationship. Such as a probabilistic model, which, given a training data set, can learn the probability distribution of inputs/outputs based on the feature condition independent assumptions, then based on this model, for a given input x, y, which has the highest probability, can be output.

In this specification, a generative model training method is provided. One or more embodiments of the present specification also relate to a generative model training apparatus, an image processing method, an image processing apparatus, another image processing method, another image processing apparatus, another generative model training method, another generative model training apparatus, a computing device, a computer-readable storage medium, and a computer program, which are described in detail one by one in the following embodiments.

In the prior art, in the iterative process corresponding to the diffusion process and the inverse process, the dimension of noisy data is always the same as the dimension of original data, which results in that the input and output dimensions of a model are the same as the dimension of the original data when a generated model based on the diffusion process is trained. As the data dimension increases, the iteration speed of training and the fitting difficulty of the model also increase. Particularly for a common image generation task, the data dimension is increased along with the square relation of the size of the image, and the fitting difficulty of a large-size image data set is obviously increased. At present, to avoid the influence caused by the unchanged dimension, the inverse process of a diffusion process with a lower dimension, which is a diffusion process of a low-dimensional principal component of original data, is trained additionally. When sampling is carried out by using an inverse process, the low-dimensional process is used for iterating for a certain number of steps, then the low-dimensional noise data is subjected to rising and certain noise compensation, and finally the original dimensionality inverse process is used for iterating to generate a final sample. However, in this method, the distribution obtained after the data is subjected to dimension enhancement and noise compensation is different from the distribution obtained in the original diffusion process at the iteration step number, and the difference may be large, which easily reduces the quality of the finally generated sample. Under the limitation, the method is difficult to perform multiple dimension changes, and effective dimension reduction is difficult to perform on data with higher dimensions.

In view of this, referring to the schematic diagram shown in fig. 1, in order to complete model training quickly and efficiently and reduce the difficulty in model fitting, a generated model training method provided in an embodiment of the present specification may perform diffusion processing on an original image after the original image is obtained, perform attenuation processing on an image component of the original image in a diffusion processing process to obtain a noise-added image set subjected to diffusion and dimensionality reduction, screen out a noise-added image that can be used for model training according to the original image and the noise-added image set, input the noise-added image into an initial generated model on the basis of the above processing to process the noise-added image, obtain a restored image, and perform parameter adjustment on the initial generated model based on the original image and the restored image until a target generated model meeting a training stop condition is obtained. The method realizes model generation through variable-dimension diffusion process training, achieves the purpose of improving the training speed and reducing the model fitting difficulty, and accordingly realizes the high-precision model generation training quickly and efficiently.

It should be noted that the user characteristic information or user data referred to in the present application is information and data authorized by the user or sufficiently authorized by each party, wherein the user characteristic information includes, but is not limited to, user personal information, user preference information, etc., the user data includes, but is not limited to, data for analysis, stored data, displayed data, such as images, etc., and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant country and region, and a corresponding operation entrance is provided for the user to choose authorization or denial.

Fig. 2 shows a flowchart of a generative model training method provided according to an embodiment of the present specification, which specifically includes the following steps.

In step S202, an original image is acquired.

The generative model training method provided by the embodiment can be applied to an image processing scene, namely training a generative model which can be used for processing an image; the trained generative model can be a generative model processed by image definition, or a generative model adjusted by image size, or a generative model adjusted by image color, and the like; in this embodiment, a training method for generating a model is described by taking training of a generating model that can be used for image sharpness processing as an example, and the same or corresponding description contents of other scenes can be referred to in this embodiment, which is not described in detail herein.

Specifically, the original image is an unprocessed image, and the original image is processed to generate a sample pair for training a generation model, where the original image in the sample pair may be used as a label, and a noisy image obtained by processing the original image may be used as a sample.

Based on this, when the generated model applied to image processing is trained, in order to improve the training speed and reduce the fitting difficulty, the original image which is not processed can be obtained first, and the original image can be conveniently diffused on the basis of the original image in the follow-up process to achieve the purpose of reducing the dimension.

And step S204, performing diffusion processing on the original image, and performing attenuation processing on the image component of the original image to obtain a noise image set.

Specifically, after the original image is obtained, the original image needs to be subjected to diffusion processing first, so that the original image is processed into a noise image set, and the subsequent training of the generation model is facilitated; in the process, in order to reduce the influence caused by the same corresponding dimension of the images before and after diffusion, attenuation processing can be performed on the image components of the original image in each iteration process of the diffusion processing, and the attenuation processing is used for realizing the dimension reduction of the image when the image components are attenuated to the lowest; by analogy, until the dimension is reduced and the noise-added image meets the requirement, the images obtained in the diffusion process can be combined into a noise-added image set for subsequent training to generate the model.

The diffusion processing specifically refers to a process of adding Gaussian noise to an original image, and is used for processing the diffused image into a noise-added image which is different from the original image in representation, and in the iterative process of diffusion, a plurality of noise-added images can be obtained; correspondingly, the image component specifically refers to an orthogonal component of the original image after orthogonal decomposition, for example, the image includes a low-frequency component and a high-frequency component. Correspondingly, the noisy image set specifically refers to a set formed by noisy images obtained by performing diffusion processing on an original image and performing attenuation processing on image components. By attenuating the image components in each diffusion iteration process, the dimension reduction of the image can be realized, so that the original image is processed into a noisy image and forms a set, and the model training is conveniently carried out by subsequently combining images with different dimensions; the image dimension of the images in the noisy image set may be smaller than the image dimension of the original image or equal to the image dimension of the original image.

Based on this, after the original image is obtained, in order to train a generation model meeting the use requirement, diffusion processing needs to be performed on the original image first, in order to reduce the influence caused by unchanged image dimensionality, component attenuation can be performed synchronously in each iteration process of diffusion until the image component attenuation is sufficiently small, and then the original image with reduced dimensionality can be obtained, and by analogy, the diffusion process with lower dimensionality can be used for approximation in each subsequent iteration process, so that the purpose of image dimensionality reduction is achieved. By repeating the above process, multiple dimensionality reduction can be realized, so that a noise image set is obtained, and a generation model corresponding to the inverse process can be conveniently trained subsequently by combining the noise image set.

In practical application, when image component attenuation is performed, attenuation of different components can be controlled through preset hyper-parameters, and the method is used for respectively performing attenuation processing on each image component in different iteration processes, so that the purpose of image dimensionality reduction is achieved. In the attenuation process, the attenuation can be regarded as achieving the purpose by reducing the hyper-parameter of the image component attenuation from 1 to be close to 0 before the iteration step number of the dimension reduction is needed, so that the dimension reduction is realized. That is to say, a part of components will be lost by the attenuation of the original image, and the scheme provided by this embodiment is to control the attenuation of this part of components, so that the error caused by dimension reduction is controllable, and the dimension reduction is completed under the condition that the error is ensured to be small enough, so as to obtain a noisy image meeting the use requirement. For example, the original image is a 256-dimensional image, and the image component is attenuated in the diffusion process to obtain a 128-dimensional image, and so on, and the resulting noisy image is a 64-dimensional image.

In specific implementation, the attenuation of image components while diffusing the original image can be realized by the following formula (1):

wherein, subscript t represents the current iteration step number of the diffusion process, subscript k represents the number of dimensionality reduction, y _k，t Representing the noisy image after k-time dimensionality reduction and t-time noise addition in the diffusion process of the original image, D _k Representing a dimensionality reduction operator, x _k，t And y _k，t Correspondingly, v represents the original image without dimension reduction but with attenuation of the image components of the original image _i Representing the ith component, λ, of the original image _i,t Representing the v after t iterations _i Degree of attenuation of (b), z _i Indicating standard Gaussian noise at v _i Projection of the subspace, σ _i,t Representing the v after t iterations _i Standard deviation of the added noise. In equation (1), the subscripts of the summation equation begin with i = k and do not include i =0,1, …, k-1, where these λ s are considered _i,t After iterating to T _k Next time, it is close enough to 0, meaning that v has already been taken _i The component attenuation is small enough so that a low-dimensional approximation can be made.

Through the formula (1), after the original image is obtained, through t times of diffusion iteration, k times of dimensionality reduction can be carried out simultaneously, and then the noise image can be obtained, so that the noise image set can be conveniently formed and used subsequently; meanwhile, in the diffusion and attenuation processes, part of orthogonal components of the added Gaussian noise are lost in the dimension reduction process, so that image noise adding parameters are recorded in the diffusion process, and the goal generation model after training is used for conducting the diffusion processWhen the image is generated, the lost noise can be added to the image again for obtaining the more accurate restored image, therefore, the sigma in the formula (1) can be used _i,t And resampling one Gaussian noise, and adding the Gaussian noise into the noise-added image when the model is generated to generate the image so as to realize noise compensation.

For example, a 256-dimensional original image is obtained first, and then the 256-dimensional original image is normalized to obtain an intermediate image. And simultaneously carrying out orthogonal decomposition on the intermediate image to obtain an orthogonal component corresponding to the intermediate image. In the diffusion process, gaussian noise is added to the 256-dimensional intermediate image through the calculation process of the formula (1), meanwhile, in each iteration process of noise addition, different orthogonal components are subjected to attenuation processing, and when the attenuation value of the orthogonal component is close to 0, dimension reduction of the 256-dimensional image can be realized, and a 128-dimensional image can be obtained. By analogy, a 16-dimensional noise image is finally obtained by continuously performing diffusion and component attenuation. So as to facilitate the subsequent training of the generative model.

Further, when performing diffusion processing on an original image and performing attenuation processing on an image component, in order to ensure that the original image is successfully reduced in dimension, normalization processing needs to be performed on the original image, and the image component corresponding to the original image is determined at the same time to perform diffusion and attenuation processing, in this embodiment, the specific implementation manner is as follows:

normalizing pixel values corresponding to pixel points in the original image to obtain an intermediate image, and performing orthogonal decomposition processing on the intermediate image to obtain an image component; and performing diffusion processing on the intermediate image, and performing attenuation processing on the image component to obtain the noise-added image set.

Specifically, the intermediate image specifically refers to a vector expression of a high-dimensional euclidean space corresponding to the original image, the image component specifically refers to an orthogonal component obtained by performing orthogonal decomposition processing on the intermediate image, and the image components are multiple and used for attenuation in each iteration process, so that multiple dimensionality reduction can be performed on the original image.

Based on the method, after the original image is obtained, in order to add noise to the original image and attenuate different image components in the process of adding noise, the purpose of reducing dimension is achieved; the pixel values corresponding to the pixel points in the original image may be normalized to obtain a vector expression of a high-dimensional euclidean space, i.e., an intermediate image. And simultaneously carrying out orthogonal decomposition processing on the intermediate image to obtain a plurality of image components, such as high-frequency components or low-frequency components, corresponding to the intermediate image. And then, performing diffusion processing on the intermediate image, and simultaneously performing attenuation processing on the image component in the iterative process of diffusion to achieve the purpose of dimension reduction, thereby obtaining a noisy image set consisting of noisy images with image dimensions less than or equal to those of the original image after the diffusion processing and the attenuation processing, and using the noisy image set for a subsequent training model.

In summary, in order to change the dimensionality of an image in the diffusion process, the image component attenuation can be achieved by adopting an image component attenuation mode, the image component attenuation is achieved in each iteration process of diffusion, so that a high-dimensional original image is converted into a low-dimensional noise image, gaussian noise is added, a noise image set is formed on the basis of the low-dimensional noise image to facilitate subsequent training of a generated model, the model can learn the noise compensation capability, and the model fitting difficulty can be reduced.

Furthermore, when the intermediate image is subjected to diffusion processing and the image component is subjected to attenuation processing, images and components processed in different diffusion periods are different, so that the original image can be subjected to continuous noise adding and dimensionality reduction to obtain a noise-added image set meeting the use requirement, and in the embodiment, the specific implementation manner is as follows:

determining an intermediate image corresponding to an ith diffusion period and an ith image component corresponding to the intermediate image, wherein i is a positive integer and is taken from 1;

adding ith noise to the intermediate image, and performing attenuation processing on the ith image component;

under the condition that the attenuation result of the ith image component is not smaller than a component threshold value, determining a first noise image with the same image dimension as that of the intermediate image according to a diffusion processing result, taking the first noise image as the intermediate image, increasing 1 by self, and executing the step of determining the intermediate image corresponding to the ith diffusion period and the ith image component corresponding to the intermediate image;

under the condition that the attenuation result of the ith image component is smaller than a component threshold value, determining a second noise image smaller than the image dimension of the intermediate image according to the diffusion processing result, taking the second noise image as the intermediate image, increasing by 1, and executing the step of determining the intermediate image corresponding to the ith diffusion period and the ith image component corresponding to the intermediate image;

and forming the noise image set according to the first noise image and the second noise image under the condition that the diffusion processing and the attenuation processing meet the iteration stop condition.

Specifically, the diffusion cycle specifically refers to a cycle of performing diffusion processing on the intermediate image, and each cycle corresponds to a diffusion iterative process; correspondingly, the ith image component specifically refers to an image component which needs to be subjected to attenuation processing in the ith diffusion cycle; accordingly, the ith noise specifically refers to gaussian noise that needs to be added in the ith diffusion period for the intermediate image. Correspondingly, the first noisy image specifically refers to an image which is subjected to diffusion processing but has no change in image dimension; correspondingly, the second noisy image specifically refers to an image which is subjected to diffusion processing and has dimension reduction in image dimension. Correspondingly, the iteration stopping condition specifically refers to a stopping condition for performing diffusion processing and image component attenuation on the intermediate image, and when the stopping condition is met, all the obtained first noise-added images and all the obtained second noise-added images can form a noise-added image set, so that image sampling can be conveniently performed on the basis of the noise-added images, and model training can be performed after the noise-added images are obtained.

Based on the method, firstly, an intermediate image needing diffusion processing in the ith diffusion period and an ith image component needing attenuation in the current ith diffusion period are determined; and secondly, adding an ith noise into the intermediate image of the ith diffusion period, simultaneously performing attenuation processing on the ith image component, if the attenuation result of the ith image component is not less than the component threshold, indicating that the current ith diffusion period does not meet the dimension reduction condition of the image, taking the intermediate image after diffusion processing in the ith diffusion period as a first noise image at the moment, and enabling the image dimensions of the currently obtained first noise image to be the same as those of the intermediate image. Thereafter, the first noisy image is taken as an intermediate image of the i +1 th diffusion period, and diffusion and component attenuation processing is performed again in return. In the process, if the image dimension of the noise-added image obtained in each diffusion period is the same as that of the original image, the noise-added image is taken as a first noise-added image.

Until the attenuation result of the image component determined by a certain diffusion period is smaller than the component threshold, which indicates that the current diffusion period meets the dimension reduction condition of the image, the intermediate image after diffusion processing can be used as a second noisy image, and the image dimension of the second noisy image obtained currently is smaller than that of the intermediate image. Thereafter, if the dimension reduction processing is required, the second noisy image may be used as an intermediate image in the next diffusion cycle, and the diffusion and component attenuation processing may be performed again. In the process, if the image dimension of the noise-added image obtained in each diffusion period is smaller than that of the original image, the noise-added image is taken as a second noise-added image.

In this embodiment, the maximum iteration step number in the diffusion process is 4, and the process of generating the noisy image set is described by taking the change to the one-time dimension as an example, and the diffusion process in practical application may refer to the same or corresponding description in this embodiment, which is not described herein in detail.

Firstly, acquiring a normalized original image x0, adding noise to the original image x0 in a 1 st iteration period, simultaneously attenuating image components, processing to determine that the current noise-added and component-attenuated image does not meet dimension reduction conditions, and then obtaining a noise-added image x1 according to a diffusion result; secondly, adding noise to the noise-added image x1, simultaneously performing attenuation processing on image components, determining that the current noise-added and component-attenuated image does not meet the dimension reduction condition, and obtaining a noise-added image x2 according to a diffusion result; thirdly, adding noise to the noise-added image x2, simultaneously performing attenuation processing on image components, determining that the current noise-added and component-attenuated image meets the dimension reduction condition, and obtaining a noise-added image x3 according to the diffusion result; finally, noise is added to the noise-added image x3, meanwhile, attenuation processing is carried out on image components, it is determined that the current noise-added and component-attenuated image does not meet the dimension reduction condition, and a noise-added image x4 is obtained according to a diffusion result; the image dimensions of the noise images x1 and x2 are the same as the image dimensions of the original image x0, the image dimensions of the noise images x3 and x4 are smaller than the image dimensions of the original image, and the image dimensions of the noise images x3 and x4 are the same.

At this time, the noisy images x1, x2, x3, and x4 may be combined to form a noisy image set corresponding to the original image x0, so as to be used in subsequent combination with the original image x0 to determine a noisy image, so as to be used in training the generation model.

In conclusion, the noisy image set is generated by combining the noisy images with changed image dimensions and the noisy images without changes, so that when subsequent sampling is facilitated, the generated model can be trained by combining the noisy images with different dimensions, the generated model can learn the capability of recovering the original dimensions, and the model prediction accuracy is improved.

And S206, determining a noise image according to the original image and the noise image set, and inputting the noise image into an initial generation model for processing to obtain a restored image.

Specifically, after the noisy image set corresponding to the original image is obtained, in order to train a generation model meeting the use requirement and having a lift-and-noise compensation capability, a noisy image may be determined by combining the original image and the noisy image set, and the initially generated model is trained by using the noisy image as a sample and the original image as a label until a target generation model meeting the use requirement is trained.

The noisy image is specifically an image which is obtained by combining the noisy image set and the original image and can be used as a training model sample, and the noisy image can be any one of the noisy image set or an image obtained by processing the noisy image by combining the original image. The initial generation model is a model that can process an image, and is not sufficiently trained at the present stage, and has a certain prediction capability, but has low accuracy. Namely, the initial generative model is the generative model which is completed with pre-training. After the noisy image is determined by combining the noisy image set and the original image, the noisy image is processed through a generation model, and a restored image similar to the original image can be obtained. For example, the original image with the definition of 1 is subjected to noise adding processing to obtain a noise added image with the definition of 8, and the noise added image is input to the trained generation model for processing, so that a restored image with the definition of 2 can be obtained, and the restored image is closer to the original image with the definition of 1. Correspondingly, the restored image specifically refers to an image obtained by performing inverse processing of a diffusion process on the noisy image through the initial generation model, and the restored image has the same image dimension as the original image.

That is to say, in order to train a generation model with lifting and noise compensation capabilities, a noisy image may be determined by combining an original image and a noisy image set, so that the noisy image is input to the initial generation model to be processed to obtain a restored image, and thus, the inverse processing precision of the diffusion process of the initial generation model may be analyzed by combining the restored image and the original image, so as to facilitate subsequent parameter adjustment, and obtain a target generation model with higher precision.

Further, when determining a noisy image, in order to reduce the difficulty of model fitting and improve the prediction accuracy of the model, a noisy image of any image dimension may be determined in a random sampling manner, and the noisy image is formed and used in combination with an original image, in this embodiment, the specific implementation manner is as follows:

randomly sampling a first target noise adding image in the noise adding image set, and performing diffusion processing on the original image based on the first target noise adding image to obtain a second target noise adding image; and taking the first target noise-added image and the second target noise-added image as the noise-added images.

Specifically, the first target noisy image is a noisy image obtained by randomly sampling in a noisy image set, and the first target noisy image obtained at this time may be a noisy image having the same image dimension as the original image or a noisy image having an image dimension smaller than the original image; correspondingly, the second target noise-added image specifically refers to a noise-added image obtained by diffusing the original image, and the diffusion is obtained by combining with relevant gaussian noise diffusion of the first target noise-added image.

Based on this, after the noisy image set is obtained, in order to train a generation model with higher prediction accuracy and stronger prediction capability, the noisy image set may be sampled randomly first, so that a first target noisy image may be sampled from the set, and in order to achieve training of the model, an original image may be subjected to diffusion processing based on the first target noisy image, so that a diffused second target noisy image is obtained, and then the first target noisy image and the second target noisy image are used as noisy images for input model training.

According to the above example, after the noisy images x1, x2, x3 and x4 are obtained, random sampling may be performed in the noisy images x1, x2, x3 and x4, if the sampling is performed on the noisy image x1 or x2, it is described that the image dimension of the obtained noisy image is the same as the image dimension of the original image x0, then the original image x0 is diffused according to the noisy image x1 or x2, so that a diffused image xi is obtained, and then the diffused image xi and the noisy image x1 or x2 are input to the generation model for processing, so as to predict the original image x0, obtain a restored image close to the original image x0, and then subsequent model parameter adjustment is performed.

If the noise-added image x4 or x3 is sampled, the image dimension of the noise-added image obtained at the moment is smaller than the image dimension of the original image x0, then the original image x0 is diffused according to the noise-added image x3 or x4 to obtain a diffused image xi, then the diffused image xi and the noise-added image x4 or x3 are input into a generation model to be processed, the original image x0 is predicted, a restored image close to the original image x0 is obtained, and then subsequent model parameter adjustment is carried out.

In conclusion, noisy images with different image dimensions are obtained in a random sampling mode and are used for model training in combination with original images, so that the model can learn the capability of recovering the dimensions, and the model prediction precision can be effectively improved.

And S208, performing parameter adjustment on the initial generation model based on the restored image and the original image until a target generation model meeting a training stop condition is obtained.

Specifically, after the restored image output by the generated model is obtained, the prediction capability of the initial generated model can be further analyzed according to the comparison between the restored image and the original image, so that when the prediction capability does not meet the requirement, the parameter can be adjusted until the target generated model meeting the training stop condition is obtained.

The training stopping condition may be a loss value comparison condition, or an iteration number condition, or a verification set verification condition, and in practical application, may be selected according to a requirement, and this embodiment is not limited herein.

Further, when adjusting the model parameters, the adjustment may be completed by calculating the loss value, and in this embodiment, the specific implementation manner is as follows:

calculating a model loss value corresponding to the initial generation model according to the restored image and the original image; and under the condition that the model loss value is smaller than a preset loss value threshold, determining that the initial generation model meets a training stop condition, and taking the initial generation model as the target generation model.

Based on this, after the restored image is obtained, the model loss value corresponding to the restored image and the original image can be calculated by combining a preset loss function, then the loss value is compared with a preset loss value threshold, if the loss value is greater than the loss value threshold, the prediction accuracy of the current model cannot meet the requirement, and a new image needs to be collected to continue training and parameter adjustment. If the loss value is less than or equal to the loss value threshold, which indicates that the prediction accuracy of the current model still meets the requirement, the current model can be used as a target generation model.

In practical applications, the calculation of the loss value may be implemented by using a cross entropy loss function, an absolute value loss function, a square loss function, and the like, and the embodiment is not limited herein.

In order to complete model training quickly and efficiently and reduce the difficulty in fitting a model, a generative model training method provided in an embodiment of the present specification may include performing diffusion processing on an original image after the original image is acquired, simultaneously performing attenuation processing on an image component of the original image in the diffusion processing process to obtain a noise-added image set subjected to diffusion and dimensionality reduction, screening a noise-added image that can be used for model training according to the original image and the noise-added image set, inputting the noise-added image to an initial generative model for processing on the basis of the noise-added image set, acquiring a restored image, and finally performing parameter adjustment on the initial generative model based on the original image and the restored image until a target generative model satisfying a training stop condition is obtained. The method realizes model generation through variable-dimension diffusion process training, achieves the purpose of improving the training speed and reducing the model fitting difficulty, and accordingly realizes the high-precision model generation training quickly and efficiently.

Corresponding to the above method embodiment, the present specification further provides an embodiment of a generative model training device, and fig. 3 illustrates a schematic structural diagram of a generative model training device provided in an embodiment of the present specification. As shown in fig. 3, the apparatus includes:

an acquisition module 302 configured to acquire an original image;

a processing module 304, configured to perform diffusion processing on the original image and perform attenuation processing on image components of the original image to obtain a noisy image set;

a determining module 306, configured to determine a noisy image according to the original image and the noisy image set, and input the noisy image to an initial generation model for processing, so as to obtain a restored image;

a training module 308 configured to perform parameter adjustment on the initial generative model based on the restored image and the original image until a target generative model satisfying a training stop condition is obtained.

In an optional embodiment, the processing module 304 is further configured to:

determining an intermediate image corresponding to an ith diffusion period and an ith image component corresponding to the intermediate image; adding ith noise to the intermediate image, and performing attenuation processing on the ith image component; under the condition that the attenuation result of the ith image component is not smaller than a component threshold value, determining a first noise image with the same image dimension as that of the intermediate image according to a diffusion processing result, taking the first noise image as the intermediate image, increasing 1 by self, and executing the step of determining the intermediate image corresponding to the ith diffusion period and the ith image component corresponding to the intermediate image; under the condition that the attenuation result of the ith image component is smaller than a component threshold value, determining a second noise image smaller than the image dimension of the intermediate image according to the diffusion processing result, taking the second noise image as the intermediate image, increasing by 1, and executing the step of determining the intermediate image corresponding to the ith diffusion period and the ith image component corresponding to the intermediate image; and forming the noise image set according to the first noise image and the second noise image under the condition that the diffusion processing and the attenuation processing meet the iteration stop condition.

In an optional embodiment, the determining module 306 is further configured to:

In an optional embodiment, the training module 308 is further configured to:

In an alternative embodiment, the diffusion process and the attenuation process are calculated by the following equations:

wherein, subscript t represents the current iteration step number of the diffusion process, subscript k represents the number of dimensionality reduction, y _k，t Representing the noisy image after k-time dimensionality reduction and t-time noise addition in the diffusion process of the original image, D _k Representing a dimensionality reduction operator, x _k，t And y _k，t Correspondingly, v represents the original image without dimensionality reduction but with attenuation of the image components of the original image _i Representing the ith component, λ, of the original image _i,t Representing the v after t iterations _i Degree of attenuation of z _i Indicating standard Gaussian noise at v _i Projection of the subspace of interest, σ _i,t Representing the v after t iterations _i Standard deviation of the added noise.

In order to complete model training quickly and efficiently and reduce the difficulty in fitting a model, a generated model training device provided in an embodiment of the present specification may perform diffusion processing on an original image after the original image is acquired, perform attenuation processing on an image component of the original image in a diffusion processing process to obtain a noise-added image set subjected to diffusion and dimensionality reduction, screen out a noise-added image that can be used for model training according to the original image and the noise-added image set, input the noise-added image to an initial generated model on the basis of the noise-added image set for processing, obtain a restored image, and finally perform parameter adjustment on the initial generated model based on the original image and the restored image until a target generated model satisfying a training stop condition is obtained. The method realizes model generation through variable-dimension diffusion process training, achieves the purpose of improving the training speed and reducing the model fitting difficulty, and accordingly realizes the high-precision model generation training quickly and efficiently.

The above is an exemplary scheme of the generative model training apparatus according to the present embodiment. It should be noted that the technical solution of the generative model training device and the technical solution of the generative model training method described above belong to the same concept, and details of the technical solution of the generative model training device, which are not described in detail, can be referred to the description of the technical solution of the generative model training method described above.

Fig. 4 is a flowchart illustrating an image processing method according to an embodiment of the present disclosure, which specifically includes the following steps.

Step S402, acquiring an image to be processed uploaded by a user terminal;

step S404, inputting the image to be processed into the target generation model in the method for processing to obtain a target image;

step S406, determining a target object based on the target image, and loading object information corresponding to the target object;

step S408, sending the object information to the user terminal.

Specifically, the user terminal specifically refers to a terminal held by a user with a commodity purchase demand, and includes but is not limited to a mobile phone, a computer, or a tablet computer. Correspondingly, the image to be processed specifically refers to an image that needs to be processed by the target generation model, and includes but is not limited to an image that needs to be subjected to definition adjustment, an image that needs to be subjected to size adjustment, or an image that needs to be subjected to color adjustment, and this embodiment is not limited in this respect. Correspondingly, the target object specifically refers to an object that can be recognized in the target image, and may be a human face, an article, a commodity, and the like, and correspondingly, the object information is related information of the target object, including but not limited to description information, a link, and the like.

Based on this, after the to-be-processed image uploaded by the user is acquired, in order to perform downstream business processing by using a clearer image or an image with a larger size or a more accurate color, the to-be-processed image may be input to the target generation model for processing, so as to obtain a target image. And then determining a target object by combining the target image, loading corresponding object information, and sending the object information to the user terminal for use.

It should be noted that the processing procedure of generating the model is the inverse processing of the diffusion procedure in the above model training procedure, that is, the inverse step-by-step iteration is performed, and when the iteration reaches the step number corresponding to the dimensionality reduction in the diffusion procedure, the corresponding lift-dimensional compensation and noise compensation are required, so as to implement the processing from the image to be processed to the target image.

Further, after the image to be processed is input to the target generation model, when the generation model performs inverse processing of the diffusion process, considering that the image to be processed may not reflect the real acquired content, the restoration may be completed by adding lost noise and performing dimension-increasing processing, and in this embodiment, the specific implementation manner is as follows:

inputting the image to be processed into the target generation model; performing dimensionality-increasing processing on the image to be processed to obtain a middle noise-added image; and carrying out noise compensation processing on the intermediate noise-added image to obtain the target image and output the target generation model.

Specifically, the dimension-raising processing specifically refers to a processing process of raising the dimension of an image to be processed with a lower dimension to be the same as the dimension in the real scene; accordingly, the noise compensation process specifically refers to a process of adding noise, which may be lost, to the image to be processed.

Based on this, after obtaining the to-be-processed image with lower dimensionality and gaussian noise, in order to facilitate the use of downstream services, the to-be-processed image may be input to the target generation model, so as to perform dimensionality enhancement processing on the to-be-processed image through the target generation model, obtain a middle noise-added image with image dimensionality equal to that of an image corresponding to a real scene, and then perform noise compensation processing on the middle noise-added image, thereby achieving the purpose of completing the dimensionality enhancement and the noise-added processing in the inverse processing process, and finally obtaining the target image and outputting the generated model, so as to facilitate the use of the downstream services.

In practical applications, when the image to be processed is subjected to the up-dimensional processing by the target generation model, the up-sampling of the image to be processed can be understood, for example, the up-sampling of one image into a larger image. And for the component lost in diffusion of the image to be processed, the component can be gradually compensated in the iteration of the inverse processing through the target generation model in the process of the inverse processing. In addition, when the image to be processed cannot be subjected to the dimension-increasing processing, the noise compensation processing can be directly performed on the image to be processed.

Furthermore, when performing noise compensation, in order to enable the image processed by the model to be closer to the real image, compensation may be performed only for the noise of image loss, and in this embodiment, the specific implementation manner is as follows

Determining image loss information corresponding to the intermediate noisy image; and carrying out noise compensation processing on the intermediate noise-added image according to the image loss information to obtain the target image.

Specifically, the image loss information specifically refers to information corresponding to noise that may be lost in an image corresponding to a real scene. Based on the above, the image loss information corresponding to the intermediate noisy image can be determined, and then the noise lost in the diffusion processing of the image can be determined, and then the noise compensation processing is performed on the intermediate noisy image based on the image loss information, so that the target image can be obtained and the model can be output.

In practical application, when the generated model performs noise compensation on the intermediate noisy image, part of noise is lost in the dimension reduction operation of the forward diffusion process. The noise added in the diffusion process is Gaussian noise, and the standard deviation of the noise is controlled by a preset hyper-parameter, so that the Gaussian noise can be added to the intermediate noise image again through the parameter for realizing the compensation processing of the noise, and the intermediate noise image is restored to the original image as far as possible.

For example, after a 16-dimensional noisy image is obtained, an image generation model with noise compensation and dimensionality improvement capabilities may be obtained first. Then, the 16-dimensional noise image is input into an image generation model for processing, a 256-dimensional restored image is obtained through noise compensation and dimension raising of the model,in both the up-scaling and noise compensation, the inverse of the diffusion process is true, i.e. T = T is initialized _k K = K-1, initializing y _k，1 Gaussian noise, followed by a single step diffusion inverse process t: = t-1; rising dimension to judge whether T is equal to T _k If not, the noise compensation and the dimensionality raising of the next step are required, and then whether the inverse processing is finished or not can be judged; if yes, then y can be changed to indicate that the dimension-increasing condition is currently met _k，1 And (4) increasing the dimension and compensating the missing noise. After this k: k-1, and then determining whether T is equal to T ₀ If the difference is equal to the original image, the image after the dimension increasing is the same as the original image, the noise compensation is completed, and the restored image is output. If not, the dimensionality of the image after the dimensionality is different from that of the original image, and the noise compensation and the dimensionality raising are required to be carried out continuously, then the single-step diffusion inverse process t is executed again: and (4) outputting the restored image until t-1 is finished.

In conclusion, the inverse processing of the diffusion process is carried out on the noisy image through the generation model integrated with the image noisy parameters, and the raising sum and noise compensation processing can be completed in the inverse processing process, so that the model can have higher prediction capability, and the generation model with lower fitting difficulty can be trained quickly.

In conclusion, the dimension-variable generation model is used for processing the image, so that the image can be processed to be close to the content of a real scene, downstream services are used on the basis, and the service requirements are further met.

Corresponding to the above method embodiment, the present specification further provides an image processing apparatus embodiment, and fig. 5 shows a schematic structural diagram of an image processing apparatus provided in an embodiment of the present specification. As shown in fig. 5, the apparatus includes:

an image obtaining module 502 configured to obtain an image to be processed uploaded by a user terminal;

a model processing module 504 configured to input the image to be processed into a target generation model in the above method for processing, so as to obtain a target image;

a determine object module 506 configured to determine a target object based on the target image and load object information corresponding to the target object;

a sending information module 508 configured to send the object information to the user terminal.

In an optional embodiment, the model processing module 504 is further configured to:

inputting the image to be processed into the target generation model; performing dimension increasing processing on the image to be processed to obtain an intermediate noise image; and carrying out noise compensation processing on the intermediate noise-added image to obtain the target image and output the target generation model.

The above is a schematic configuration of an image processing apparatus of the present embodiment. It should be noted that the technical solution of the image processing apparatus belongs to the same concept as the technical solution of the image processing method, and details that are not described in detail in the technical solution of the image processing apparatus can be referred to the description of the technical solution of the image processing method.

Fig. 6 is a flowchart illustrating another image processing method provided in accordance with an embodiment of the present specification, which includes the following steps.

Step S602, acquiring an initial shopping search image uploaded by a user terminal;

step S604, inputting the initial shopping search image into the target generation model in the method for processing, and obtaining a target shopping search image corresponding to the initial shopping search image;

step S606, determining related commodities based on the target shopping search image, and loading commodity information corresponding to the related commodities;

step S608, sending the commodity information to the user terminal, where the user terminal generates a commodity recommendation interface based on the commodity information and displays the commodity recommendation interface.

Specifically, the user terminal specifically refers to a terminal held by a user with a commodity purchase demand, and includes but is not limited to a mobile phone, a computer, a tablet computer, or the like. Accordingly, the initial shopping search image specifically refers to an image that a user submits when searching for a commodity. Correspondingly, the target shopping search image is specifically an image obtained after being processed by the target generation model, and is clearer compared with the initial shopping search image and meets the business search requirement. Correspondingly, the related commodities specifically refer to commodities which can be searched by the related images. Correspondingly, the commodity information specifically refers to relevant information corresponding to the associated commodity, and is used for displaying a commodity detail interface at the user terminal.

Based on the above, after the initial shopping search image uploaded by the user terminal is obtained; the initial shopping search image can be input into the target generation model in the method for processing, and a target shopping search image corresponding to the initial shopping search image is obtained; then, related commodities can be determined based on the target shopping search image, and commodity information corresponding to the related commodities is loaded; and finally, sending the commodity information to a user terminal, and generating a commodity recommendation interface by the user terminal based on the commodity information and displaying.

For example, after the user a takes an image of the article 1 and uploads the image, the image may be input to the image generation model to be processed to obtain a target image which has higher definition and conforms to the application of the commodity search scene, and then the commodity a and the commodity B are determined based on the target image, and information corresponding to the commodity a and the commodity B is loaded to be sent to the client of the user a, so as to display a commodity recommendation interface at least including the commodity a and the commodity B.

In conclusion, the dimension-variable generation model is used for processing the image, so that the image can be processed to be close to the content of a real scene, downstream services are used on the basis of the content, and the service requirements are further met.

Corresponding to the above method embodiment, the present specification further provides an image processing apparatus embodiment, and fig. 7 shows a schematic structural diagram of another image processing apparatus provided in an embodiment of the present specification. As shown in fig. 7, the apparatus includes:

an image acquisition module 702 configured to acquire an initial shopping search image uploaded by a user terminal;

an input model module 704, configured to input the initial shopping search image into the target generation model in the above method for processing, and obtain a target shopping search image corresponding to the initial shopping search image;

a loading information module 706 configured to determine a related commodity based on the target shopping search image and load commodity information corresponding to the related commodity;

a sending information module 708 configured to send the commodity information to the user terminal, where the user terminal generates a commodity recommendation interface based on the commodity information and displays the commodity recommendation interface.

The above is a schematic configuration of another image processing apparatus of the present embodiment. It should be noted that the technical solution of the image processing apparatus belongs to the same concept as the technical solution of the image processing method, and details that are not described in detail in the technical solution of the image processing apparatus can be referred to the description of the technical solution of the image processing method.

Fig. 8 is a flowchart illustrating another generative model training method provided in an embodiment of the present disclosure, which is applied to a server and includes the following steps.

Step S802, receiving an original image uploaded by a model demand side.

Step S804, performing diffusion processing on the original image, and performing attenuation processing on the image component of the original image to obtain a noisy image set.

Step S806, determining a noise image according to the original image and the noise image set, and inputting the noise image into an initial generation model for processing to obtain a restored image.

And step S808, performing parameter adjustment on the initial generation model based on the restored image and the original image until a target generation model meeting a training stop condition is obtained.

Step S810, determining model parameters corresponding to the target generation model, and feeding the model parameters back to the model demand side.

The model demand end specifically refers to a demand end with a training target generation model, correspondingly, the server end specifically refers to a server end providing model training requirements, the server end provides a generation model which is trained in advance, when the model demand end has model use requirements, a model training request can be sent to the server end, the server end can be used to complete an initial generation model through pre-training, and the initial generation model is trained further by combining a sample provided by the model demand end, so that the calculation resources of the server end can be utilized, the model demand end is assisted to complete the training of the target generation model, and the purposes of saving resources and improving the model training precision are achieved.

Optionally, the performing diffusion processing on the original image and performing attenuation processing on an image component of the original image to obtain a noisy image set includes:

Optionally, the performing diffusion processing on the intermediate image and attenuation processing on the image component to obtain the noisy image set includes:

determining an intermediate image corresponding to an ith diffusion period and an ith image component corresponding to the intermediate image; adding ith noise to the intermediate image, and performing attenuation processing on the ith image component; under the condition that the attenuation result of the ith image component is not smaller than a component threshold value, determining a first noise image with the same image dimension as that of the intermediate image according to a diffusion processing result, taking the first noise image as the intermediate image, increasing by 1, and executing a step of determining the intermediate image corresponding to the ith diffusion period and the ith image component corresponding to the intermediate image; under the condition that the attenuation result of the ith image component is smaller than a component threshold value, determining a second noise image smaller than the image dimension of the intermediate image according to the diffusion processing result, taking the second noise image as the intermediate image, increasing by 1, and executing the step of determining the intermediate image corresponding to the ith diffusion period and the ith image component corresponding to the intermediate image; and forming the noise image set according to the first noise image and the second noise image under the condition that the diffusion processing and the attenuation processing meet the iteration stop condition.

Optionally, the determining a noisy image according to the original image and the noisy image set includes:

Optionally, the performing parameter adjustment on the initial generative model based on the restored image and the original image until a target generative model meeting a training stop condition is obtained includes:

Optionally, the diffusion process and the attenuation process are calculated by the following formula:

wherein, subscript t represents the current iteration step number of the diffusion process, subscript k represents the number of dimensionality reduction, y _k，t Representing the noisy image after k dimensionality reductions and t noisy images in the diffusion process, D _k Representing a dimensionality reduction operator, x _k，t And y _k ， _t Correspondingly, v represents the original image without dimensionality reduction but with attenuation of the image components of the original image _i Representing the ith component, λ, of the original image _i,t Representing the v after t iterations _i Degree of attenuation of z _i Indicating standard Gaussian noise at v _i Projection of the subspace, σ _i,t Representing the v after t iterations _i Standard deviation of the added noise.

In summary, in order to complete model training quickly and efficiently and reduce the difficulty of model fitting, after an original image is obtained, diffusion processing may be performed on the original image, meanwhile, attenuation processing may be performed on image components of the original image in the diffusion processing process to obtain a noise-added image set subjected to diffusion and dimension reduction, then, a noise-added image that can be used for model training may be screened out according to the original image and the noise-added image set, and then, the noise-added image may be input to an initial generation model for processing on the basis of the noise-added image set, so that a restored image may be obtained, and finally, a parameter of the initial generation model may be adjusted based on the original image and the restored image until a target generation model satisfying a training stop condition is obtained. The method realizes model generation through variable-dimension diffusion process training, achieves the purpose of improving the training speed and reducing the model fitting difficulty, and accordingly realizes the high-precision model generation training quickly and efficiently.

The other generative model training method provided by this embodiment is an exemplary scheme of the other generative model training method of this embodiment. It should be noted that the technical solution of the other generative model training method belongs to the same concept as the technical solution of the first generative model training method, and details of the technical solution of the other generative model training method, which are not described in detail, can be referred to the description of the technical solution of the first generative model training method.

Corresponding to the above method embodiment, the present specification further provides another generative model training device embodiment, and fig. 9 shows a schematic structural diagram of another generative model training device provided in an embodiment of the present specification. As shown in fig. 9, the apparatus includes:

an image receiving module 902 configured to receive an original image uploaded by a model demand side;

an image processing module 904, configured to perform diffusion processing on the original image and perform attenuation processing on an image component of the original image to obtain a noisy image set;

a determining image module 906, configured to determine a noisy image according to the original image and the noisy image set, and input the noisy image into an initial generation model for processing, so as to obtain a restored image;

a model training module 908 configured to parametrize the initial generative model based on the restored image and the original image until a target generative model satisfying a training stop condition is obtained;

a parameter sending module 910 configured to determine a model parameter corresponding to the target generation model, and feed back the model parameter to the model requiring end.

randomly sampling a first target noise-added image in the noise-added image set, and performing diffusion processing on the original image based on the first target noise-added image to obtain a second target noise-added image; and taking the first target noise-added image and the second target noise-added image as the noise-added images.

wherein, subscript t represents the current iteration step number of the diffusion process, subscript k represents the number of dimensionality reduction, y _k，t Representing the noisy image after k-time dimensionality reduction and t-time noise addition in the diffusion process of the original image, D _k Representing a dimensionality reduction operator, x _k，t And y _k，t Correspondingly, v represents the original image without dimension reduction but with attenuation of the image components of the original image _i Representing the ith component, λ, of the original image _i,t Representing the v after t iterations _i Degree of attenuation of z _i Indicating standard Gaussian noise at v _i Projection of the subspace, σ _i,t Representing the v after t iterations _i Standard deviation of the added noise.

The above is a schematic solution of another generative model training apparatus according to the present embodiment. It should be noted that the technical solution of the generative model training device and the technical solution of the generative model training method described above belong to the same concept, and details of the technical solution of the generative model training device, which are not described in detail, can be referred to the description of the technical solution of the generative model training method described above.

The following description further describes the generative model training method with reference to fig. 10 by taking an application of the generative model training method provided in this specification in an actual application scenario as an example. Fig. 10 is a flowchart illustrating a processing procedure of a generative model training method according to an embodiment of the present specification, which specifically includes the following steps.

In step S1002, an original image is acquired.

Step S1004, performing normalization processing on pixel values corresponding to the pixel points in the original image to obtain an intermediate image, and performing orthogonal decomposition processing on the intermediate image to obtain an image component.

Step S1006, the intermediate image is subjected to diffusion processing, and the image component is subjected to attenuation processing, so that a noise-added image set is obtained.

And step S1008, randomly sampling a first target noise-added image in the noise-added image set, and performing diffusion processing on the original image based on the first target noise-added image to obtain a second target noise-added image.

Step S1010, the first target noisy image and the second target noisy image are taken as noisy images.

In step S1012, an initial generation model is acquired.

And step S1014, inputting the noisy image into the initial generation model for processing to obtain a restored image.

And step S1016, performing parameter adjustment on the initial generation model based on the restored image and the original image until a target generation model meeting the training stop condition is obtained.

FIG. 11 illustrates a block diagram of a computing device 1100 provided in accordance with one embodiment of the present description. The components of the computing device 1100 include, but are not limited to, memory 1110 and a processor 1120. The processor 1120 is coupled to the memory 1110 via a bus 1130 and the database 1150 is used to store data.

The computing device 1100 also includes an access device 1140, the access device 1140 enabling the computing device 1100 to communicate via one or more networks 1160. Examples of such networks include a Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 1140 may include one or more of any type of Network interface (e.g., a Network interface controller) whether wired or Wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) Wireless interface, a Worldwide Interoperability for Microwave Access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular Network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the application, the above-described components of computing device 1100, as well as other components not shown in FIG. 11, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 11 is for purposes of example only and is not limiting as to the scope of the present application. Other components may be added or replaced as desired by those skilled in the art.

The computing device 1100 may be any type of stationary or mobile computing device, including a mobile Computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop Computer or Personal Computer (PC). Computing device 1100 can also be a mobile or stationary server.

The processor 1120 is configured to execute computer-executable instructions, which when executed by the processor, implement the steps of the generative model training method and the image processing method described above.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device belongs to the same concept as the technical solution of the generative model training method and the image processing method described above, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the generative model training method and the image processing method described above.

An embodiment of the present specification further provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the generative model training method and the image processing method described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the generative model training method and the image processing method described above, and details of the technical solution of the storage medium, which are not described in detail, can be referred to the description of the technical solution of the generative model training method and the image processing method described above.

An embodiment of the present specification also provides a computer program, wherein when the computer program is executed in a computer, the computer program causes the computer to execute the steps of the generative model training method and the image processing method described above.

The above is a schematic scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program is the same concept as the technical solution of the generative model training method and the image processing method described above, and details of the technical solution of the computer program, which are not described in detail, can be referred to the description of the technical solution of the generative model training method and the image processing method described above.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer-readable medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Furthermore, those skilled in the art will appreciate that the embodiments described in this specification are presently preferred and that no acts or modules are required in the implementations of the disclosure.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A generative model training method, the generative model being a machine learning model, comprising:

acquiring an original image;

2. The method of claim 1, wherein the diffusing the original image and attenuating the image components of the original image to obtain a noisy image set comprises:

normalizing pixel values corresponding to pixel points in the original image to obtain an intermediate image, and performing orthogonal decomposition processing on the intermediate image to obtain an image component;

and performing diffusion processing on the intermediate image, and performing attenuation processing on the image component to obtain the noise-added image set.

3. The method of claim 2, wherein diffusing the intermediate image and attenuating the image component to obtain the noisy image set, comprises:

determining an intermediate image corresponding to an ith diffusion period and an ith image component corresponding to the intermediate image;

4. A method according to any one of claims 1-3, said determining a noisy image from said original image and said set of noisy images, comprising:

randomly sampling a first target noise adding image in the noise adding image set, and performing diffusion processing on the original image based on the first target noise adding image to obtain a second target noise adding image;

and taking the first target noise-added image and the second target noise-added image as the noise-added images.

5. The method of any of claims 1-3, the parametrizing the initial generative model based on the restored image and the original image until a target generative model satisfying a training stop condition is obtained, comprising:

calculating a model loss value corresponding to the initial generation model according to the restored image and the original image;

and under the condition that the model loss value is smaller than a preset loss value threshold, determining that the initial generation model meets a training stop condition, and taking the initial generation model as the target generation model.

6. A generative model training apparatus comprising:

an acquisition module configured to acquire an original image;

the processing module is configured to perform diffusion processing on the original image and perform attenuation processing on image components of the original image to obtain a noise image set;

7. An image processing method comprising:

acquiring an image to be processed uploaded by a user terminal;

inputting the image to be processed into a target generation model in the method of any one of claims 1 to 5 for processing to obtain a target image;

and sending the object information to the user terminal.

8. The method according to claim 7, wherein the inputting the image to be processed into a target generation model for processing to obtain a target image comprises:

inputting the image to be processed into the target generation model;

performing dimension increasing processing on the image to be processed to obtain an intermediate noise image;

and carrying out noise compensation processing on the intermediate noise-added image to obtain the target image and output the target generation model.

9. The method of claim 8, wherein said performing noise compensation processing on the intermediate noisy image to obtain the target image comprises:

determining image loss information corresponding to the intermediate noisy image;

and carrying out noise compensation processing on the intermediate noise-added image according to the image loss information to obtain the target image.

10. An image processing method comprising:

acquiring an initial shopping search image uploaded by a user terminal;

inputting the initial shopping search image into a target generation model in the method of any one of claims 1 to 5 for processing to obtain a target shopping search image corresponding to the initial shopping search image;

11. A generative model training method is applied to a server, wherein the generative model is a machine learning model and comprises the following steps:

receiving an original image uploaded by a model demand side;

and determining model parameters corresponding to the target generation model, and feeding the model parameters back to the model demand side.

12. A computing device, comprising:

a memory and a processor;

the memory is for storing computer-executable instructions, and the processor is for executing the computer-executable instructions, which when executed by the processor implement the steps of the method of any one of claims 1 to 5 or 7 to 9 or 10 or 11.

13. A computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 5 or 7 to 9 or 10 or 11.