WO2024109910A1 - 一种生成模型训练方法、数据转换方法以及装置 - Google Patents

一种生成模型训练方法、数据转换方法以及装置 Download PDF

Info

Publication number
WO2024109910A1
WO2024109910A1 PCT/CN2023/133865 CN2023133865W WO2024109910A1 WO 2024109910 A1 WO2024109910 A1 WO 2024109910A1 CN 2023133865 W CN2023133865 W CN 2023133865W WO 2024109910 A1 WO2024109910 A1 WO 2024109910A1
Authority
WO
WIPO (PCT)
Prior art keywords
diffusion
model
data
input
score value
Prior art date
Application number
PCT/CN2023/133865
Other languages
English (en)
French (fr)
Inventor
罗维俭
胡天阳
孙嘉城
张世枫
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024109910A1 publication Critical patent/WO2024109910A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration

Definitions

  • Generative models have a wide range of application scenarios and great value. They can be used to achieve a variety of tasks, such as high-resolution image generation, text-to-image conversion, text-to-speech or speech generation.
  • Implicit generative models represented by generative adversarial networks (GANs) introduce a game between the discriminant network and the generative network, and use adversarial training to learn the transformation from noise to data distribution.
  • GANs generative adversarial networks
  • the optimization process is unstable and the training is prone to collapse. Therefore, how to achieve more stable generative model training has become an urgent problem to be solved.
  • the present application provides a generative model training method, a data conversion method and a device applied in the field of artificial intelligence, which are used to obtain a generative model with better output effect based on more stable training, and use the generative model to perform data conversion required by users.
  • the present application provides a generative model training method, comprising: first, using data in a noise set as input of a generative model, outputting at least one generated sample, the generative model being used to perform data conversion on the input data, the noise set being able to include multiple frames of noise data, the multiple frames of noise data being able to include received noise data or randomly generated data; subsequently, using at least one generated sample as input of a first diffusion model, outputting at least one first diffusion score value, the first diffusion model being used to diffuse each generated sample at least once and score the diffused data, which is equivalent to using the first diffusion model to score the output effect of the generative model; subsequently, updating the generative model according to at least one first diffusion score value and at least one second diffusion score value output by the second diffusion model to obtain an updated generative model, the second diffusion model being trained using a real sample set, each sample in the real sample set including a corresponding label, the second diffusion model being used to diffuse the input data at least once and score the diffused data
  • two diffusion models are used for diffusion respectively, such as one diffusion of real samples and one diffusion of generated samples, so as to reduce the distribution distance between real samples and generated samples, so as to calculate the loss value between real samples and generated samples, thereby reversely updating the generative model without adversarial training, thereby improving the stability of the optimization model.
  • the first diffusion model can be updated first, and the loss value can be calculated based on the diffusion score value output again by the diffusion model, so that the generation model can be updated using the more accurate score value output by the updated diffusion model, thereby achieving stable update of the generation model.
  • the aforementioned method provided in the present application may further include: first, using samples in the real sample set as input to the second diffusion model, and outputting at least one fourth diffusion score value; then updating the second diffusion model according to the at least one fourth diffusion score value to obtain an updated second diffusion model; and again using samples in the real sample set as input to the updated second diffusion model, and outputting at least one second diffusion score value.
  • the second diffusion model can be pre-trained, so that in the training process of the generative model, the second diffusion model can directly output a better diffusion score value, which is equivalent to using the pre-trained second diffusion model as a teacher model and the generative model as a student model for distillation to obtain a generative model with better output effect.
  • the pre-trained second diffusion model as a teacher model and the generative model as a student model for distillation to obtain a generative model with better output effect.
  • there is no need to additionally train the second diffusion model which can reduce training overhead and improve training efficiency.
  • the first diffusion model is used to: add noise to the first generated sample according to a preset step length to obtain at least one first noise sample; use the at least one first noise sample as an input of the first score function, and output at least one first diffusion score value. Therefore, the first diffusion model makes the distance between the noise sample and the real sample closer by adding noise, so as to facilitate the subsequent loss value calculation, thereby realizing the stable update of the generated model.
  • the second diffusion model when samples in a real sample set are used as inputs to a second diffusion model, the second diffusion model is used to: add noise to samples in the real sample set according to a preset step size to obtain at least one second noise sample; and use at least one second noise sample as input to a second scoring function to obtain at least one second diffusion score value.
  • the first diffusion model and the second diffusion model can use the same diffusion step size for diffusion, so that samples with the same diffusion scale can be scored.
  • the generative model is used to perform one or more of the following tasks: converting input text into an image, converting input speech into an image, performing data completion on an input image, converting input text into speech, or converting the resolution of an input image. Therefore, the generative model provided by the present application can be applied to a variety of scenarios, has diversity, and has strong generalization capabilities.
  • the present application provides a data conversion method, comprising:
  • Input data is used as input of a generative model to obtain an output result.
  • the generative model is used to extract features from the input data, and use the extracted features to perform modeling to obtain an output result.
  • the generative model is used to extract features from the input data, and generate data of a preset type according to the extracted features.
  • the generative model is trained according to the output results of a first diffusion model and a second diffusion model.
  • the first diffusion model is trained using the output samples of the generative model before training is completed, and the second diffusion model is trained using a real sample set. Each sample in the real sample set includes a corresponding label.
  • the second diffusion model is used to diffuse the input data at least once and score the diffused data. The parameters of the first diffusion model and the second diffusion model are different.
  • a first diffusion model that is updated using the output data of the generative model to be trained can be set, and the second diffusion model is a model obtained by training using a set of real samples. Calculating the loss value based on the difference between the outputs of the first diffusion model and the second diffusion model and updating the generative model is equivalent to using the second diffusion model as a teacher model and the first diffusion model as a student model for knowledge distillation. Therefore, there is no need for adversarial training, and more stable and efficient training can be achieved.
  • the method provided in the present application can match the score function between the distribution of real data and generated data at multiple diffusion scales, thereby achieving efficient non-adversarial training of implicit generative models.
  • the generative model is used to perform one or more of the following tasks: converting input text into an image, converting input speech into an image, performing data completion on an input image, converting input text into speech, or converting the resolution of an input image. Therefore, the generative model provided by the present application can be applied to a variety of scenarios, has diversity, and has strong generalization capabilities.
  • the present application provides a generative model training device, comprising:
  • a generation module used to use the data in the noise set as input of a generation model, and output at least one generation sample, wherein the generation model is used to perform data conversion on the input data, and the noise set includes multiple frames of noise data;
  • a first diffusion module used to take at least one generated sample as an input of a first diffusion model, and output at least one first diffusion score value, wherein the first diffusion model is used to diffuse each generated sample at least once and score the diffused data;
  • a training module is used to update the generation model according to at least one first diffusion score value and at least one second diffusion score value output by the second diffusion model to obtain an updated generation model, wherein the second diffusion model is trained using a real sample set, each sample in the real sample set includes a corresponding label, and the second diffusion model is used to diffuse the input data at least once and score the diffused data, the parameters of the first diffusion model and the second diffusion model are different, and the updated generation model is used to extract features from data input by a user in a computing device and generate corresponding data based on the extracted features.
  • the training module is specifically used to:
  • the first diffusion model is updated by at least one first diffusion score value to obtain an updated first diffusion model; at least one generated sample is used as the input of the updated first diffusion model, and at least one third diffusion score value is output, and at least one second diffusion score value corresponds to at least one third diffusion score value; at least one second diffusion score value output by the second diffusion model is obtained, and the generated model is updated according to the loss value between each third diffusion score value and the corresponding second diffusion score value in the at least one third diffusion score value to obtain an updated generated model.
  • the apparatus further includes: a second diffusion module, configured to use samples in the real sample set as inputs of the second diffusion model, and output at least one fourth diffusion score value;
  • the training module is further used to update the second diffusion model according to at least one fourth diffusion score value to obtain an updated second diffusion model; use samples in the real sample set as input of the updated second diffusion model, and output at least one second diffusion score value.
  • the second diffusion model is a model pre-trained with a real sample set
  • the training module is further configured to extract at least one second diffusion score value from the second diffusion model.
  • the first diffusion model is used to: add noise to the first generated sample according to a preset step length to obtain at least one first noise sample; use the at least one first noise sample as an input of a first score function, and output at least one first diffusion score value.
  • the second diffusion model when samples in a real sample set are used as inputs to a second diffusion model, the second diffusion model is used to: add noise to samples in the real sample set according to a preset step size to obtain at least one second noise sample; and use at least one second noise sample as input to a second scoring function to obtain at least one second diffusion score value.
  • the generative model is used to perform one or more of the following tasks: converting input text into an image, converting input speech into an image, performing data completion on an input image, converting input text into speech, or converting the resolution of an input image.
  • the present application provides a data conversion device, comprising:
  • a transceiver module used for receiving input data, the input data including data input by a user
  • a generation module is used to use the input data as the input of the generation model to obtain the output result.
  • the generation model is used to extract features from the input data and use the extracted features to perform modeling to obtain the output result;
  • the generation model is used to extract features from input data and generate preset types of data according to the extracted features.
  • the generation model is trained according to the output results of the first diffusion model and the second diffusion model.
  • the first diffusion model is trained using the output samples of the generation model before training is completed, and the second diffusion model is trained using a real sample set. Each sample in the real sample set includes a corresponding label.
  • the second diffusion model is used to diffuse the input data at least once and score the diffused data.
  • the parameters of the first diffusion model and the second diffusion model are different.
  • the generative model is used to perform one or more of the following tasks: converting input text into an image, converting input speech into an image, performing data completion on an input image, converting input text into speech, or converting the resolution of an input image.
  • an embodiment of the present application provides a generative model training device, which has the function of implementing the method of the first aspect above.
  • the function can be implemented by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • an embodiment of the present application provides a data conversion device, which has the function of implementing the method of the first aspect above.
  • the function can be implemented by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • an embodiment of the present application provides a generative model training device, comprising: a processor and a memory, wherein the processor and the memory are interconnected via a line, and the processor calls the program code in the memory to execute the processing-related functions in the generative model training method shown in any one of the first aspects above.
  • the generative model training device can be a chip.
  • an embodiment of the present application provides a data conversion device, comprising: a processor and a memory, wherein the processor and the memory are interconnected via a line, and the processor calls a program code in the memory to execute the processing-related functions in the data conversion method shown in any one of the second aspects above.
  • the data conversion device can be a chip.
  • an embodiment of the present application provides a generative model training device, which can also be called a digital processing chip or chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit.
  • the processing unit is used to perform functions related to processing in the above-mentioned first aspect or any optional embodiment of the first aspect.
  • an embodiment of the present application provides a data conversion device, which can also be called a digital processing chip or chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit.
  • the processing unit is used to perform functions related to processing as described in the second aspect or any optional embodiment of the second aspect.
  • an embodiment of the present application provides a computer-readable storage medium, including instructions, which, when executed on a computer, enables the computer to execute a method in any optional implementation of the first aspect or the second aspect above.
  • an embodiment of the present application provides a computer program product comprising instructions, which, when executed on a computer, enables the computer to execute a method in any optional implementation of the first aspect or the second aspect above.
  • FIG1 is a schematic diagram of a system architecture provided by the present application.
  • FIG2 is a schematic diagram of another system architecture provided by the present application.
  • FIG3 is a schematic diagram of another system architecture provided by the present application.
  • FIG4 is a flow chart of a generative model training method provided in the present application.
  • FIG5 is a flow chart of another generative model training method provided in the present application.
  • FIG6 is a flow chart of another generative model training method provided in the present application.
  • FIG7 is a schematic diagram of a diffusion effect provided by the present application.
  • FIG8 is a flow chart of another generative model training method provided by the present application.
  • FIG9 is a flow chart of another generative model training method provided in the present application.
  • FIG10 is a flow chart of a data conversion method provided by the present application.
  • FIG11 is a schematic diagram of a generation effect provided by the present application.
  • FIG12 is a schematic diagram of a training effect provided by the present application.
  • FIG13 is another schematic diagram of a generation effect provided by the present application.
  • FIG14 is another schematic diagram of a generation effect provided by the present application.
  • FIG15 is another schematic diagram of a generation effect provided by the present application.
  • FIG16 is another schematic diagram of a generation effect provided by the present application.
  • FIG17 is another schematic diagram of a generation effect provided by the present application.
  • FIG18 is a schematic diagram of the structure of a generative model training device provided by the present application.
  • FIG19 is a schematic diagram of the structure of a data conversion device provided by the present application.
  • FIG20 is a schematic diagram of the structure of another generative model training device provided by the present application.
  • FIG21 is a schematic diagram of the structure of another data conversion device provided by the present application.
  • FIG. 22 is a schematic diagram of the structure of a chip provided in the present application.
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be a general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensed process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
  • the infrastructure provides computing power support for the artificial intelligence system, enables communication with the outside world, and is supported by the basic platform. It communicates with the outside world through sensors; computing power is provided by smart chips, such as central processing units (CPU), neural-network processing units (NPU), graphics processing units (GPU), application specific integrated circuits (ASIC) or field programmable gate arrays (FPGA) and other hardware acceleration chips; the basic platform includes distributed computing frameworks and networks and other related platform guarantees and support, which can include cloud storage and computing, interconnected networks, etc. For example, sensors communicate with the outside world to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
  • smart chips such as central processing units (CPU), neural-network processing units (NPU), graphics processing units (GPU), application specific integrated circuits (ASIC) or field programmable gate arrays (FPGA) and other hardware acceleration chips
  • the basic platform includes distributed computing frameworks and networks and other related platform guarantees and support, which can include cloud storage and computing, interconnected networks,
  • the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, text, and IoT data of traditional devices, including business data of existing systems and perception data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, and training.
  • Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formalized information to perform machine thinking and solve problems based on reasoning control strategies. Typical functions are search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
  • some general capabilities can be further formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.
  • Smart products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, which productizes intelligent information decision-making and realizes practical applications. Its application areas mainly include: smart terminals, smart transportation, smart medical care, autonomous driving, smart cities, etc.
  • the method provided in this application involves related concepts of neural networks. To facilitate understanding, the related concepts of neural networks involved are first introduced below.
  • a neural network can be composed of neural units.
  • a neural unit can refer to an operation unit that takes x s as input, and the output of the operation unit can be:
  • n is a natural number greater than 1
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to perform nonlinear transformation on the features obtained in the neural network and convert the input signal in the neural unit into the output signal.
  • the output signal of the activation function can be used as the input of the next convolution layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • Convolutional neural network is a deep neural network with a convolutional structure.
  • Convolutional neural network contains a feature extractor consisting of a convolution layer and a subsampling layer, which can be regarded as a filter.
  • Convolutional layer refers to the neuron layer in the convolutional neural network that performs convolution processing on the input signal.
  • a neuron can only be connected to some neurons in the adjacent layers.
  • a convolutional layer usually contains several feature planes, each of which can be composed of some rectangularly arranged neural units. The neural units in the same feature plane can share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract features is independent of position.
  • Convolution kernels can be formalized as matrices of random sizes, and convolution kernels can obtain reasonable weights through learning during the training process of convolutional neural networks.
  • the direct benefit of shared weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • a generative model is a model that generates observations randomly, especially given some implicit parameters. It assigns a joint probability distribution to a sequence of observations and annotated data.
  • generative models can be used to model data directly (e.g., sampling data based on the probability density function of a variable), or to establish conditional probability distributions between variables.
  • Conditional probability distributions can be formed by generative models using Bayes' theorem.
  • the implicit generative model is a transformation from noise to real data parameterized by a neural network. After the generative model is trained, random noise is input and high-quality samples can be output.
  • the model is called an implicit model because the model cannot obtain an estimate of the probability density function of the data, but can only sample from it.
  • the generative model mentioned below in this application is the implicit generative model.
  • the generative diffusion model is a type of probabilistic generative model.
  • the model uses a time-dependent score function model s(x, t) (usually a deep neural network) to fit the score function of the probability distribution of the data distribution along a certain type of diffusion process, thereby learning the characteristics of the data distribution.
  • the generative diffusion model generates data by simulating the solution of an inverse stochastic differential equation.
  • Score function refers to the gradient of the logarithmic density function of the probability distribution with respect to the independent variable. It is a description of the probability distribution and its mathematical expression is Where p(x) refers to the probability density function and s(x) refers to the score function.
  • the process is to input low-dimensional random noise into the implicit generative network and transmit the sample out after one network forward pass.
  • DPM the process starts from random noise of the same dimension and goes through a denoising process corresponding to the diffusion noise process. It generally takes thousands of network forward passes to generate samples.
  • the diffusion process is the solution of a stochastic differential equation. This is a continuous-time Markov process, which is a continuous sample path. Brownian motion, reflected Brownian motion and Ornstein-Uhlenbeck process are diffusion processes.
  • the loss function is the loss function or objective function, which are important equations used to measure the difference between the predicted value and the target value.
  • the loss function can usually include loss functions such as squared error, cross entropy, logarithm, exponential, etc.
  • the squared error can be used as a loss function, defined as The specific loss function can be selected according to the actual application scenario.
  • Convolutional neural networks can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial network model during the training process, so that the reconstruction error loss of the model becomes smaller and smaller.
  • BP error back propagation
  • the forward transmission of the input signal to the output will generate error loss, and the error loss information is back-propagated to update the parameters in the initial model, so that the error loss converges.
  • the back propagation algorithm is a back propagation movement dominated by error loss, aiming to obtain the optimal model parameters, such as the weight matrix.
  • the method provided in the embodiment of the present application can be executed on a server or on a terminal device.
  • the server may include a local server, an integrated server or a distributed server, etc.
  • the terminal device may include a mobile phone, a tablet personal computer (TPC), a media player, a smart TV, a laptop computer (LC), a personal digital assistant (PDA), a personal computer (PC), a camera, a camcorder, a smart watch, a wearable device (WD) or an autonomous driving vehicle, etc., and the embodiment of the present application does not limit this.
  • an embodiment of the present application provides a system architecture 200 .
  • a data acquisition device 260 can be used to collect training data.
  • the training data is stored in a database 230 , and the training device 220 trains the generated model 201 based on the training data maintained in the database 230 .
  • the training device 220 obtains the generative model 201 based on the training data.
  • the training device 220 constructs the generative model based on the attribute heterogeneity graph, and updates the parameters of the generative model through comparative learning, thereby completing the training of the generative model 201.
  • the training method please refer to the training method below.
  • the generation model 201 in the embodiment of the present application may specifically be a neural network. It should be noted that, in actual applications, the training data maintained in the database 230 may not all come from the data acquisition device 260, but may also be received from other devices. It should also be noted that the training device 220 may not necessarily train the generation model 201 entirely based on the training data maintained by the database 230, but may also obtain training data from the cloud or other places for model training. The above description should not be used as a limitation on the embodiments of the present application.
  • the generation model 201 obtained by training the training device 220 can be applied to different systems or devices, such as the execution device 210 shown in FIG1 .
  • the execution device 210 can be a terminal, such as a mobile phone terminal, a tablet computer, a laptop computer, augmented reality (AR)/virtual reality (VR), a vehicle terminal, a television, etc., and can also be a server or a cloud.
  • the execution device 210 is configured with a transceiver 212, which can include an input/output (I/O) interface or other wireless or wired communication interfaces, etc., for data interaction with external devices. Taking the I/O interface as an example, a user can input data to the I/O interface through the client device 240.
  • I/O input/output
  • the execution device 210 When the execution device 210 preprocesses the input data, or when the computing module 211 of the execution device 210 performs calculations and other related processing, the execution device 210 can call the data, code, etc. in the data storage system 250 for corresponding processing, and can also store the data, instructions, etc. obtained from the corresponding processing into the data storage system 250.
  • the I/O interface returns the processing result to the client device 240 for providing to the user.
  • the training device 220 can generate a corresponding generation model 201 based on different training data for different goals or different tasks, and the corresponding generation model 201 can be used to achieve the above goals or complete the above tasks, thereby providing the user with the desired results.
  • the user can manually give input data, and the manual giving can be operated through the interface provided by the transceiver 212.
  • the client device 240 can automatically send input data to the transceiver 212. If the client device 240 is required to automatically send input data, the user can set the corresponding authority in the client device 240. The user can view the results output by the execution device 210 on the client device 240, and the specific presentation form can be a specific method such as display, sound, action, etc.
  • the client device 240 can also be used as a data acquisition terminal to collect the input data of the input transceiver 212 and the output result of the output transceiver 212 as shown in the figure as new sample data, and store it in the database 230. Of course, it is also possible not to collect through the client device 240, but the transceiver 212 directly stores the input data of the input transceiver 212 and the output result of the output transceiver 212 as new sample data in the database 230.
  • FIG1 is only a schematic diagram of a system architecture provided in an embodiment of the present application.
  • the positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation.
  • the data storage system 250 is an external memory relative to the execution device 210. In other cases, the data storage system 250 can also be placed in the execution device 210.
  • the system architecture of the application of the generative model construction method provided by the present application can be shown in Figure 2.
  • the server cluster 310 is implemented by one or more servers, and optionally, cooperates with other computing devices, such as data storage, routers, load balancers, etc.
  • the server cluster 310 can use the data in the data storage system 250, or call the program code in the data storage system 250 to implement the steps of the generative model construction method provided by the present application.
  • Each local device can represent any computing device, such as a personal computer, a computer workstation, a smart phone, a tablet computer, a smart camera, a smart car or other type of cellular phone, a media consumption device, a wearable device, a set-top box, a game console, etc.
  • the local device of each user can interact with the server cluster 310 through a communication network of any communication mechanism/communication standard, and the communication network can be a wide area network, a local area network, a point-to-point connection, or any combination thereof.
  • the communication network may include a wireless network, a wired network, or a combination of a wireless network and a wired network.
  • the wireless network includes, but is not limited to: a fifth-generation mobile communication technology (5th-Generation, 5G) system, a long-term evolution (long term evolution, LTE) system, a global system for mobile communication (global system for mobile communication, GSM) or a code division multiple access (code division multiple access, CDMA) network, a wideband code division multiple access (wideband code division multiple access, WCDMA) network, wireless fidelity (wireless fidelity, WiFi), Bluetooth (bluetooth), Zigbee protocol (Zigbee), radio frequency identification technology (radio frequency identification, RFID), long-range (Lora) wireless communication, and near-field wireless communication (NFC) Any one or more combinations.
  • the wired network may include an optical fiber communication network or a network composed of coaxial cables, etc.
  • one or more aspects of the execution device 210 may be implemented by each local device.
  • the local device 301 may provide local data or feedback calculation results to the execution device 210 .
  • the local device 301 implements the functions of the execution device 210 and provides services to its own user, or provides services to the user of the local device 302.
  • the training method of the generative model provided in the present application can be deployed on a local server, on the cloud, or on a local terminal.
  • the generative model obtained by training can be deployed on a terminal, a local server, or a cloud server, etc.
  • the training phase provided in the present application can be deployed in a server, and the server trains the generative model to obtain a trained generative model.
  • the trained model can be deployed on a cloud platform, and the user can obtain the required data through the cloud platform by performing input operations on the client.
  • AI services and products in the cloud sector reflect the on-demand use and purchase characteristics of cloud services, as well as the abstract, diverse, and widely used characteristics of AI technology.
  • PaaS platform-as-a-Service
  • SaaS software-as-a-Service
  • AI basic development platform service For the first type of AI basic development platform service, public cloud service providers provide users with AI basic development platforms with sufficient underlying resource support and upper-level AI algorithm capabilities.
  • the built-in AI development framework and various AI algorithms in the AI basic development platform allow users to quickly build and develop AI models or AI applications that meet personalized needs on the AI basic development platform.
  • public cloud service providers provide general AI application cloud services through cloud platforms, allowing users to use AI capabilities with zero barriers in various application scenarios.
  • the public cloud AI basic development platform is a PaaS cloud service in the cloud platform. It is a software platform that assists users (also known as tenants, AI developers, etc.) in building, training, and deploying AI models, as well as developing and deploying AI applications, based on the large amount of basic resources and software capabilities owned by the public cloud service provider.
  • the generative model training method provided in the present application can be deployed and executed by a server, and the generative model obtained by training can be deployed on a cloud platform, and can be called by users for a fee in the form of an application program interface (API).
  • API application program interface
  • the method provided in the present application can be deployed on a cloud platform as a service for users, and an API for calling the service can be provided to users.
  • Users can call the service through the API and input relevant information of the data to be generated, such as text to be converted into images, text to be converted into speech, or images to improve resolution, etc.
  • the service generates the required data for users and improves user experience.
  • the interaction between users and the AI basic development platform mainly includes: users log in to the cloud platform through the client web page, select and purchase cloud services of the AI basic development platform in the cloud platform, and users can perform full-process AI services based on the functions provided by the AI basic development platform.
  • the basic resources supporting any process in an AI platform may be distributed on different physical devices, that is, the hardware devices that actually execute a process are usually server clusters in the same data center, or server clusters distributed in different data centers.
  • These data centers can be the central cloud data centers of cloud service providers or edge data centers provided by cloud service providers to users.
  • the resources in the public cloud can be used to run the model training and model management functions provided in the AI basic development platform
  • the resources in the private cloud can be used to run the data storage and data preprocessing functions provided in the AI basic development platform, which can provide stronger security for user data.
  • the resources of the public cloud can come from the central cloud data center
  • the resources of the private cloud can come from the edge data center.
  • the AI platform can be independently deployed on a server or virtual machine in a data center in a cloud environment, or the AI platform can be distributedly deployed on multiple servers in a data center, or distributedly deployed on multiple virtual machines in a data center.
  • the AI platform provided in this application can also be deployed in a distributed manner in different environments.
  • the AI platform provided in this application can be logically divided into multiple parts, each of which has different functions.
  • a part of the AI platform 100 can be deployed in a computing device in an edge environment (also called an edge computing device), and another part can be deployed in a device in a cloud environment.
  • the edge environment is an environment that is geographically close to the user's terminal computing device, and the edge environment includes edge computing devices, such as edge servers, edge stations with computing capabilities, etc.
  • the various parts of the AI platform 100 deployed in different environments or devices work together to provide users with functions such as training AI models.
  • Generative models are used in a variety of scenarios.
  • GAN and diffusion generative models are commonly used generative models, such as high-resolution image generation, text-to-image, text-to-speech, and speech generation.
  • the implicit generative model represented by GAN introduces a game between the discriminative network and the generative network, and uses adversarial training to learn the transformation from noise to data distribution.
  • the advantages are high generation quality, relatively small model, fast generation speed, and low deployment cost.
  • the disadvantages are that with the introduction of adversarial training, the optimization process is unstable, the training is prone to collapse, it is sensitive to hyperparameters, and the generation diversity is insufficient.
  • the SOTA implicit generative model designs a larger implicit generative network and uses traditional adversarial training methods for training. However, it is very sensitive to the selection of hyperparameters, difficult to optimize, and requires complex regularization techniques to stabilize training.
  • the diffusion generation model relies on the probability diffusion process, which brings the two distributions closer by adding noise, thus reducing the learning difficulty.
  • the process of adding noise causes the data to continuously lose its original information until it eventually becomes white noise.
  • the diffusion probability model is the reverse process of learning the noise addition process, thus obtaining a denoising process.
  • the denoising process allows the data to gradually restore information until it is finally restored to normal and clean data.
  • the SOTA diffusion generation model diffuses data through a diffusion process, and uses a score function network to fit the score function of the diffusion data distribution at multiple diffusion levels.
  • the scheme finally outputs a score function network.
  • the data generation process is achieved by repeatedly iterating the score function network.
  • the model core function network
  • the model has many parameters, generally requires 1GB+ storage space, and has a slow generation speed. Generating a single image requires thousands of steps of large network forward operations.
  • the implicit model accelerated diffusion model attempts to introduce the implicit generative model as a component into the diffusion model structure.
  • the reverse process in the diffusion probability model is modeled through a conditional implicit generative model to improve the generation efficiency of the diffusion probability model.
  • This solution uses a conditional generative network to model the long-distance diffusion process in the probability diffusion model, so it can partially alleviate the shortcoming that the probability diffusion model needs to be iterated repeatedly during operation.
  • the conditional generative network still relies on traditional adversarial training methods for training, which is unstable and prone to training failure.
  • Diffusion-GAN attempts to introduce the diffusion process into implicit generative model training, which has better generation effects than traditional GAN methods.
  • This method diffuses the data distribution and the generated distribution through a random process, and performs adversarial training on the diffused data.
  • this method still uses adversarial training, and there is still instability in training, which may lead to training failure.
  • this application provides a generative model training method, which can obtain a generative model with good generation effect, high diversity, stable training, efficient sampling and convenient deployment. It can be understood that this application proposes a non-adversarial implicit generative model training framework based on the diffusion process, which can overcome the defects of existing solutions and has great value.
  • a generative model training method provided in the present application is described as follows.
  • the noise set may include multiple frames of noise data, which may include randomly generated data, data received from other devices, data randomly input by a user, and the like.
  • the data in the noise set includes the type of input data of the generative model to be trained.
  • the input data type of the generative model may include text, image, speech, or other data types
  • the data type of the noise data in the noise set may also include text, image, speech, or other data types.
  • the data type of the noise data in the noise data set is also text; if the input data type of the generative model is image, the type of noise data in the noise data set may also include image; if the input data of the generative model includes speech, the type of noise data in the noise data set may also include speech.
  • the generation model may be iteratively updated multiple times.
  • the following is illustratively described by taking one iterative update as an example, that is, the following steps 402 to 405 may be iteratively executed.
  • the generative model can be set up with feature extractors for images, speech, and text. After obtaining the noise set, the noise data in the noise set can be used as the input of the generative model.
  • the generative model extracts features from the input noise data, generates corresponding data based on the features, and outputs at least one generated sample.
  • features can also be extracted from the input data through other feature extraction models and then input into the generative model to complete the generation step.
  • the tasks that the generative model can perform may include one or more of converting input text to an image, converting input speech to an image, performing data completion on an input image, converting input text to speech, or converting the resolution of an input image.
  • the generative model can be used to convert the text into a representation vector, and image features can be extracted from the representation vector, and the corresponding image can be generated based on the image features.
  • the input text includes "animals, cats”
  • the text can be converted into a data type that can be processed by the neural network, that is, converted into an embedded representation, and image features can be extracted from the embedded representation to generate an image including a cat.
  • the data in the noise set may include multiple frames of images.
  • the image in the noise set may be used as the input of the generative model, and features are extracted from the image through the generative model.
  • the pixel values of the pixels to be completed are inferred based on the features to obtain the completed image.
  • the noise set may include multiple frames of speech data.
  • the speech data is used as the input of the generative model, the semantic features of the speech data are identified by the generative model, and the extracted features are processed to generate the corresponding image.
  • the first diffusion model In the process of training the generated model, at least two diffusion models need to be introduced, which are referred to as the first diffusion model and the second diffusion model for easy distinction.
  • the at least one generated sample can be used as the input of the first diffusion model, and the first diffusion model can be used to diffuse the input generated sample at least once, and output a score for the data after each diffusion through a score function to obtain at least one diffusion score, which is called a first diffusion score value for ease of distinction.
  • the first diffusion score value may include the gradient of the logarithmic probability density function used in the diffusion to the independent variable, which can be understood as representing the probability distribution.
  • the difference between the first diffusion model and the second diffusion model is usually that the model parameters are different.
  • the input data used are different, and the training effects achieved are different, so the parameters of the first diffusion model and the second diffusion model are different.
  • the second diffusion model may be a model pre-trained using a real sample set, or may be a model that is trained synchronously using the real sample set as input during the process of training a generated model.
  • the real sample set mentioned in this application can be understood as a sample pair.
  • the real sample set may include multiple pairs of sample pairs, each pair of sample pairs may include text and one or more frames of images corresponding to the text; if the generative model is used to generate images based on speech, each pair of sample pairs may include speech and one or more frames of images corresponding to the text; if the generative model is used to improve the image resolution, each pair of sample pairs may include a low-resolution image and a high-resolution image, and so on.
  • the diffusion step length during the training of the first diffusion model may be the same as the step length during the training of the second diffusion model, so that second diffusion score values having the same scale as the respective diffusion step lengths in the first diffusion model may be directly extracted from the second diffusion model.
  • the samples in the real sample set can be used as inputs of the second diffusion model to output at least one second diffusion score value.
  • the diffusion step lengths used by the first diffusion model and the second diffusion model can be the same.
  • the same step size can be used for diffusion.
  • the scale of the data obtained after adding noise is usually the same, so that the loss value can be calculated at the same scale later, thereby improving training stability and model convergence efficiency.
  • the first diffusion model can add noise to the first generated sample according to a preset step length to obtain at least one first noise sample, and then output the corresponding at least one first diffusion score through the first score function in the first diffusion model.
  • the second diffusion model can add noise to the sample according to the preset step length to obtain at least one second noise sample; then the at least one second noise sample is used as the input of the second score function to obtain at least one second diffusion score value.
  • the first diffusion model can be updated by at least one first diffusion score value to obtain an updated first diffusion model.
  • the generated sample can then be used as the input of the updated first diffusion model, and at least one new diffusion score value can be output, which is called a third diffusion score value for easy distinction.
  • the at least one third diffusion score value corresponds to the at least one second diffusion score value one by one, and the loss value can be calculated according to the difference between each third diffusion score value in the at least one third diffusion score value and the corresponding second diffusion value, and the generated model can be reversely updated according to the loss value to obtain an updated generated model.
  • multiple iterations of training may be performed, that is, multiple steps 402 to 405 are performed until the model converges.
  • the convergence conditions of the model may include but are not limited to one or more of the following: the number of iterations reaches a preset number of times, the iteration duration reaches a preset duration, the difference between the outputs of the first diffusion model and the second diffusion model is within a preset range, the change value of the loss function is not greater than a preset change value, etc., which can be selected according to the actual application scenario.
  • a first diffusion model that is updated using the output data of the generative model to be trained can be set, and the second diffusion model is a model obtained by training using a set of real samples.
  • the loss value is calculated based on the difference between the outputs of the first diffusion model and the second diffusion model and the generative model is updated, which is equivalent to using the second diffusion model as a teacher model and the first diffusion model as a student model for knowledge distillation. Therefore, there is no need for adversarial training, and more stable and efficient training can be achieved.
  • the method provided in the present application can match the score function between the distribution of real data and generated data at multiple diffusion scales, thereby achieving efficient non-adversarial training of implicit generative models.
  • the second diffusion model may also be updated before the generative model is updated.
  • the samples in the real sample set may be used as inputs of the second diffusion model, and at least one fourth diffusion score value may be output; the second diffusion model may be updated according to the at least one fourth diffusion score value to obtain an updated second diffusion model.
  • the samples in the real sample set may be used as inputs of the updated second diffusion model, and at least one second diffusion score value may be output.
  • the second diffusion model can be trained synchronously, or the second diffusion model can be pre-trained before training the generative model.
  • the generative model training method provided in the present application is introduced from the dimensions of different training methods.
  • FIG5 is a flowchart of another generation model training method provided in the present application.
  • model parameters including: generating model G ⁇ , diffusion model
  • the iteration steps are performed until the model converges.
  • the diffusion model can be updated. as well as Based on the updated diffusion model as well as Update the generated model G ⁇ .
  • There are many ways to judge the convergence of the model such as the number of iterations reaches a preset number, the iteration time reaches a preset time, the diffusion model and The output difference is within the preset range, the change value of the loss function is not greater than the preset change value, etc.
  • the specific selection can be based on the actual application scenario.
  • the converged generative model G ⁇ is then deployed, such as deploying the generative model G ⁇ on a cloud platform to provide a server to users in the form of a client, or deploying the generative model G ⁇ on a local server or terminal of the user, so that the user can perform data conversion locally.
  • the generative model G ⁇ can adopt a variety of network structures, which can usually be selected according to the actual tasks to be performed. For example, a network that meets the hardware load range of the computing device can be built, or a commonly used network architecture can be selected, such as U-Net, CNN, or RNN. The specific network can be determined according to the actual application scenario.
  • Diffusion Model as well as The same diffusion process can be used, such as expressed as p(x t
  • x 0 ), and the diffusion process can be understood as a process of adding noise to the sample, such as expressed as d X t f(x,t)dt+g(t)dW t .
  • is the real sample score function, and the parameters of the model can be expressed as ⁇ , including weight parameters or network parameters.
  • the parameters of the model can be expressed as ⁇ , including weight parameters or network parameters.
  • the diffusion process of the diffusion model can select Gaussian diffusion, variance preserving (VP) or variance exploding (VE) and other diffusion methods.
  • the input samples are real data, such as images, texts or voices.
  • the input sample is the generated sample output by the generative model G ⁇ .
  • G ⁇ can receive noise samples, use the noise samples as the input of the generative model G ⁇ , and output multiple generated samples.
  • the training process can include updating the diffusion model as well as Based on the updated diffusion model as well as Update the generation model G ⁇ and the diffusion model and The steps can be executed simultaneously and are introduced below respectively.
  • the diffusion model can be used to diffuse the same input sample multiple times, and the diffusion model can be updated based on the diffusion data obtained from each diffusion. For example, taking the sample input to the diffusion model as an image, the diffusion process can refer to FIG. 7 , and each diffusion can add noise to the image obtained by the previous diffusion until the diffusion times are reached or noise data is obtained.
  • the loss function is then calculated using the diffused data.
  • the loss function may be a minimization loss function, mean square error, cross entropy, logarithm, exponential, or other loss function.
  • the loss value can be used for reverse update, that is, to update the parameter ⁇ and obtain the updated
  • the loss value can be used for reverse update, that is, to update the parameter ⁇ and obtain the updated
  • the diffusion model in the current iteration can be fixed. as well as Parameters, that is, after fixing ⁇ and ⁇ .
  • Input, output Generate samples again as updated Input, output
  • the loss value is then calculated, as expressed as:
  • the loss value can be used to reversely update the generative model G ⁇ , that is, update the parameter ⁇ to obtain the updated G ⁇ .
  • multiple diffusion models are set up, and the diffusion models are trained using real samples and generated samples respectively.
  • the distance between the real samples and the generated samples is shortened, so that the output of the diffusion model trained using the generated samples matches the output of the diffusion model trained using the real samples as much as possible.
  • the output value of the score function corresponding to the generated sample output by the generated model is closer to the output value of the score function corresponding to the real sample.
  • the present application updates the generated model by matching the output value of the score function, and the optimization process is more stable, and efficient training can be achieved.
  • the method provided in the present application can be applied for training for various generation scenarios, which can achieve generation diversity and strong generalization ability.
  • the output of the second diffusion model obtained by training with real samples can be used as a guide, so that the first diffusion model can learn the parameters of the score function more efficiently, without the need for a more complex diffusion model structure, and without taking up more storage space.
  • FIG8 is a flowchart of another generation model training method provided in the present application.
  • the difference between the process of pre-training the second diffusion model and the process in Figure 5 above is that in the implementation mode of the present application, there is no need to synchronously train the second diffusion model, and the value of the score function of the corresponding scale can be directly extracted from the pre-trained second diffusion model as a guide to train the generated model.
  • the generative model G ⁇ can adopt a variety of network structures, which can usually be selected according to the actual tasks to be performed. For example, a network that meets the hardware load range of the computing device can be built, or a commonly used network architecture can be selected, such as U-Net, CNN, or RNN. The specific network can be determined according to the actual application scenario.
  • the diffusion model can be expressed as p(x t
  • x 0 ), and the diffusion process can be understood as the process of adding noise to the sample, such as d x t f(x,t)dt+g(t)dw t .
  • model parameters can be expressed as ⁇ , including weight parameters or network parameters, etc., which do not need to be updated in this embodiment
  • the parameters of the model can be expressed as ⁇ , including weight parameters or network parameters.
  • the input sample is the generated sample output by the generation model G ⁇ .
  • the sample can be randomly generated or a noise sample can be received.
  • the noise sample is used as the input of the generation model G ⁇ to output multiple generated samples.
  • the training process can include updating the diffusion model Based on pre-training The output of the updated diffusion model The output of updates the generation model G ⁇ and updates the diffusion model The steps can be executed simultaneously and are introduced below respectively.
  • the loss value can be used for reverse update, that is, to update the parameter ⁇ and obtain the updated
  • the diffusion model in the current iteration can be fixed. as well as , that is, after fixing ⁇ and ⁇ .
  • Get the generated data x 0 G ⁇ (z), z ⁇ p prior (z) of the current training batch, and obtain the generated data x t ⁇ p(x t
  • Input, output Generate samples again as updated Input, output
  • the loss value is then calculated, as expressed as:
  • the loss value can be used to reversely update the generative model G ⁇ , that is, update the parameter ⁇ to obtain the updated G ⁇ .
  • the image generator part uses the diffusion generation model, and the generation process of the DELLE-2 model usually takes a long time. Therefore, the generator module of the trained DELLE-2 model can be subjected to knowledge distillation, and the distillation target can be the implicit generation network StyleGAN-XL, which can greatly improve the generation speed while maintaining the generation effect.
  • a pre-trained diffusion model can be used to guide the implicit generative model, which is equivalent to using the pre-trained diffusion model as a teacher model and the implicit generative model as a student model to perform knowledge distillation, which can reduce the training cost of the generative model.
  • fewer diffusion models can be trained, which can improve the training efficiency of the generative model.
  • the present application provides a new non-adversarial implicit generative model training method based on the probability diffusion process.
  • the score function between the real data and the generated data distribution is matched at multiple diffusion scales, thereby achieving efficient non-adversarial training of the implicit generative model.
  • the above introduces the generative model training method provided in the present application.
  • the trained generative model can be deployed in the cloud, local server or local terminal.
  • the following, in combination with specific application scenarios, the data conversion method provided in the present application and the effects achieved by the above generative model training method are introduced in detail.
  • a flow chart of a data conversion method provided by the present application is as follows.
  • the generated model can be deployed in a cloud device or a local device.
  • the user can perform input operations on the local client, such as inputting text to be converted into an image, and the client sends it to the cloud, which can receive the user's input data.
  • the user can input data to the local computing device through the input device.
  • the input data After receiving the input data, the input data can be used as the input of the generative model, and the output result can be fed back to the user.
  • the generative model can be used for data conversion and can be trained through the steps corresponding to Figures 4 to 9 above. For details, please refer to the above introduction and will not be repeated here.
  • the generation model may include a feature extraction module, and the tasks that can be performed may include converting input text into an image, converting input speech into an image, performing data completion on an input image, converting input text into speech, or converting the resolution of an input image.
  • the generative model can be used to convert the text into a representation vector, extract features from the representation vector, and output the corresponding image based on the features.
  • the input text includes "animals, cats”
  • the text can be converted into a data type that can be processed by the neural network, that is, converted into an embedded representation, and features can be extracted from the embedded representation to generate an image including a cat.
  • the data in the noise set may include multiple frames of images.
  • the image in the noise set may be used as the input of the generative model, and features are extracted from the image through the generative model.
  • the pixel values of the pixels to be completed are inferred based on the features to obtain the completed image.
  • the noise set may include multiple frames of speech data.
  • the speech data is used as the input of the generative model, the semantic features of the speech data are identified by the generative model, and the extracted features are processed to generate the corresponding image.
  • the output results can be fed back to the user.
  • the output result can be sent to the user's client and the output result can be displayed in the user's client.
  • the output result can be displayed on a display device set in the local computing device or a connected display device.
  • the generative model obtained by efficient training can be used to realize data conversion, which can achieve better generation effect.
  • the implicit generative model has the characteristics of being lightweight, so the deployment of the generative model does not need to occupy more storage resources, can be applied to a variety of hardware devices, and has strong generalization ability.
  • the model trained in this application is compared with the commonly used generation models.
  • the commonly used generation models are taken as examples of GAN and WGAN.
  • the fitting effect is shown in Table 1.
  • the generation effect can be shown in Figure 11.
  • the trend of the loss function of the generation model can be shown in Figure 12.
  • the user can input on the client, and the input text can include " ancient road, west wind, thin horse, misty and melodious 3D painting of ancient times".
  • the generative model can output multiple frames of corresponding output images.
  • the generative model can extract the features included in the input text, such as the entities “horse” and “ Egyptiant road”, as well as the data type "painting" to be converted, with high clarity and friendly user experience.
  • the user can input text on the client, and the input text may include “motorcycle sunset Chinese style painting”, and the generation model may output multiple frames of corresponding output images.
  • the generation model may output multiple frames of corresponding output images, and the generation model may extract features included in the input text, such as entities “motorcycle” and “sunset”, the data type “painting” to be converted, and the image style “medieval style”, and may combine multiple features to generate multiple frames of images.
  • the user can input text on the client, and the input text may include “future city science fiction illustration”, and the generative model may output multiple frames of corresponding output images.
  • the generative model may output multiple frames of corresponding output images, and the generative model may extract features included in the input text, such as the entity “city”, the data type “illustration” to be converted, and the image style “science fiction”, and may combine multiple features to generate multiple frames of images.
  • the user can input text on the client, and the input text can include “pyramid Van Gogh style”, and the generation model can output multiple frames of corresponding output images.
  • the generation model can output multiple frames of corresponding output images, and the generation model can extract the features included in the input text, such as the entity “pyramid” or “Van Gogh”, and the data type to be converted can be defaulted to the image and the image style “Van Gogh style”, and multiple features can be combined to generate multiple frames of images.
  • the user can input text on the client, and the input text may include “a cup of coffee absorbs cosmic energy 3D painting”, and the generative model may output multiple frames of corresponding output images.
  • the generative model may output multiple frames of corresponding output images, and the generative model may extract the features included in the input text, such as the entities “coffee”, “universe” or “energy”, etc.
  • the data type to be converted may be defaulted to the image and the image style “3D painting”, and multiple features may be combined, etc., to generate multiple frames of images.
  • a schematic diagram of the structure of a generative model training device includes:
  • a generating module 1801 is used to use the data in the noise set as the input of the generating model and output at least one generating sample.
  • the generating model is used to perform data conversion on the input data.
  • the noise set includes multiple frames of noise data.
  • a first diffusion module 1802 used to take at least one generated sample as an input of a first diffusion model, and output at least one first diffusion score value, wherein the first diffusion model is used to diffuse each generated sample at least once and score the diffused data;
  • the training module 1803 is used to update the generation model according to at least one first diffusion score value and at least one second diffusion score value output by the second diffusion model to obtain an updated generation model, wherein the second diffusion model is trained using a real sample set, and each sample in the real sample set includes a corresponding label.
  • the second diffusion model is used to diffuse the input data at least once and score the diffused data.
  • the parameters of the first diffusion model and the second diffusion model are different.
  • the updated generation model is used to extract features from the data input by the user in the computing device and generate corresponding data based on the extracted features.
  • the training module 1803 is specifically configured to:
  • the first diffusion model is updated by at least one first diffusion score value to obtain an updated first diffusion model; at least one generated sample is used as the input of the updated first diffusion model, and at least one third diffusion score value is output, and at least one second diffusion score value corresponds to at least one third diffusion score value; at least one second diffusion score value output by the second diffusion model is obtained, and the generated model is updated according to the loss value between each third diffusion score value and the corresponding second diffusion score value in the at least one third diffusion score value to obtain an updated generated model.
  • the apparatus further includes: a second diffusion module 1804, configured to use samples in the real sample set as inputs of a second diffusion model, and output at least one fourth diffusion score value;
  • the training module 1803 is further configured to update the second diffusion model according to at least one fourth diffusion score value to obtain an updated second diffusion model; use samples in the real sample set as inputs of the updated second diffusion model, and output at least one second diffusion score value.
  • the second diffusion model is a model pre-trained with a real sample set
  • the training module 1803 is further configured to extract at least one second diffusion score value from the second diffusion model.
  • the first diffusion model is used to: add noise to the first generated sample according to a preset step length to obtain at least one first noise sample; use the at least one first noise sample as an input of a first score function, and output at least one first diffusion score value.
  • the second diffusion model when samples in a real sample set are used as inputs to a second diffusion model, the second diffusion model is used to: add noise to samples in the real sample set according to a preset step size to obtain at least one second noise sample; and use at least one second noise sample as input to a second scoring function to obtain at least one second diffusion score value.
  • the generative model is used to perform one or more of the following tasks: converting input text into an image, converting input speech into an image, performing data completion on an input image, converting input text into speech, or converting the resolution of an input image.
  • a schematic diagram of the structure of a data conversion device provided by the present application includes:
  • the transceiver module 1901 is used to receive input data, where the input data includes data input by a user;
  • a generation module 1902 is used to use the input data as input of a generation model to obtain an output result.
  • the generation model is used to extract features from the input data and use the extracted features to perform modeling to obtain an output result.
  • the generation model is used to extract features from the input data and generate preset types of data according to the extracted features.
  • the generation model is trained according to the output results of the first diffusion model and the second diffusion model.
  • the first diffusion model is trained using the output samples of the generation model before the training is completed, and the second diffusion model is trained using a real sample set. Each sample in the real sample set includes a corresponding label.
  • the second diffusion model is used to diffuse the input data at least once and score the diffused data. The parameters of the first diffusion model and the second diffusion model are different.
  • the generative model can be obtained by training the process of the generative model training method corresponding to the aforementioned Figures 4-17.
  • the generative model is used to perform one or more of the following tasks: converting input text into an image, converting input speech into an image, performing data completion on an input image, converting input text into speech, or converting the resolution of an input image.
  • Figure 20 is a structural diagram of another generation model training device provided in the present application, as described below.
  • the generative model training device may include a processor 2001 and a memory 2002.
  • the processor 2001 and the memory 2002 are interconnected via a line.
  • the memory 2002 stores program instructions and data.
  • the memory 2002 stores program instructions and data corresponding to the steps in the aforementioned Figures 4 to 17.
  • Processor 2001 is used to execute the method steps performed by the generation model training device shown in any of the embodiments in Figures 4 to 17 above.
  • the generative model training device may further include a transceiver 2003 for receiving or sending data.
  • a computer-readable storage medium is also provided in an embodiment of the present application, in which a program for generating a vehicle driving speed is stored.
  • the program When the program is running on a computer, the computer executes the steps in the method described in the embodiments shown in the aforementioned Figures 4 to 17.
  • the generative model training device shown in the aforementioned FIG. 20 is a chip.
  • FIG. 21 is a schematic diagram of the structure of another data conversion device provided in the present application, as described below.
  • the data conversion device may include a processor 2101 and a memory 2102.
  • the processor 2101 and the memory 2102 are interconnected via a line.
  • the memory 2102 stores program instructions and data.
  • the memory 2102 stores program instructions and data corresponding to the steps in the aforementioned Figures 4 to 17.
  • the processor 2101 is used to execute the method steps performed by the data conversion device shown in any of the embodiments in Figures 4 to 17 above.
  • the data conversion device may further include a transceiver 2103 for receiving or sending data.
  • a computer-readable storage medium is also provided in an embodiment of the present application, in which a program for generating a vehicle driving speed is stored.
  • the program When the program is running on a computer, the computer executes the steps in the method described in the embodiments shown in the aforementioned Figures 4 to 17.
  • the data conversion device shown in the aforementioned FIG. 21 is a chip.
  • An embodiment of the present application also provides a generative model training device, which can also be called a digital processing chip or chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit.
  • the processing unit is used to execute the method steps performed by the generative model training device shown in any of the embodiments in Figures 4 to 17 above.
  • An embodiment of the present application also provides a data conversion device, which can also be called a digital processing chip or chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit.
  • the processing unit is used to execute the method steps performed by the data conversion device shown in any of the embodiments in Figures 4 to 17 above.
  • the embodiment of the present application also provides a digital processing chip.
  • the digital processing chip integrates a circuit and one or more interfaces for implementing the above-mentioned processor 2001, or the functions of the processor 2001.
  • the digital processing chip can complete the method steps of any one or more of the above-mentioned embodiments.
  • the digital processing chip does not integrate a memory, it can be connected to an external memory through a communication interface.
  • the digital processing chip implements the actions performed by the generation model training device in the above-mentioned embodiment according to the program code stored in the external memory.
  • the generative model training device may be a chip, and the chip includes: a processing unit and a communication unit, wherein the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin or a circuit, etc.
  • the processing unit may execute the computer execution instructions stored in the storage unit so that the chip in the server executes the generative model training method described in the embodiments shown in the above-mentioned Figures 4 to 17.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device end, such as a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), etc.
  • ROM read-only memory
  • RAM random access memory
  • the embodiment of the present application also provides a digital processing chip.
  • the digital processing chip integrates a circuit and one or more interfaces for implementing the above-mentioned processor 2101 or the function of the processor 2101.
  • the digital processing chip can complete the method steps of any one or more embodiments in the above-mentioned embodiments.
  • the digital processing chip does not integrate a memory, it can be connected to an external memory through a communication interface.
  • the digital processing chip implements the actions performed by the data conversion device in the above-mentioned embodiment according to the program code stored in the external memory.
  • the data conversion device provided in the embodiment of the present application can be a chip, and the chip includes: a processing unit and a communication unit, the processing unit can be, for example, a processor, and the communication unit can be, for example, an input/output interface, a pin or a circuit, etc.
  • the processing unit can execute the computer execution instructions stored in the storage unit, so that the chip in the server executes the data conversion method described in the embodiments shown in Figures 4 to 17 above.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit can also be a storage unit located outside the chip in the wireless access device end, such as a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), etc.
  • ROM read-only memory
  • RAM random access memory
  • Also provided in an embodiment of the present application is a computer program product, which, when executed on a computer, enables the computer to execute the steps executed by the image decompression device or the image decompression device in the method described in the embodiments shown in the aforementioned Figures 4 to 17.
  • the aforementioned processing unit or processor may be a central processing unit (CPU), a neural-network processing unit (NPU), a graphics processing unit (GPU), a digital signal processor (DSP), an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • CPU central processing unit
  • NPU neural-network processing unit
  • GPU graphics processing unit
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor or any conventional processor, etc.
  • FIG. 22 is a schematic diagram of a structure of a chip provided in an embodiment of the present application.
  • the chip can be a neural network processor NPU 220, which is mounted on the host CPU (Host CPU) as a coprocessor and assigned tasks by the Host CPU.
  • the core part of the NPU is the operation circuit 2203, which is controlled by the controller 2204 to extract matrix data from the memory and perform multiplication operations.
  • the operation circuit 2203 includes multiple processing units (process engines, PEs) inside.
  • the operation circuit 2203 is a two-dimensional systolic array.
  • the operation circuit 2203 can also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
  • the operation circuit 2203 is a general-purpose matrix processor.
  • the operation circuit takes the corresponding data of matrix B from the weight memory 2202 and caches it on each PE in the operation circuit.
  • the operation circuit takes the matrix A data from the input memory 2201 and performs matrix operations with matrix B.
  • the partial results or final results of the obtained matrix are stored in the accumulator 2208.
  • the unified memory 2206 is used to store input data and output data.
  • the weight data is directly transferred to the weight memory 2202 through the direct memory access controller (DMAC) 2205.
  • the input data is also transferred to the unified memory 2206 through the DMAC.
  • DMAC direct memory access controller
  • the bus interface unit (BIU) 2210 is used for the interaction between the AXI bus and the DMAC and instruction fetch buffer (IFB) 2209.
  • the bus interface unit 2210 (BIU) is used for the instruction fetch memory 2209 to obtain instructions from the external memory, and is also used for the storage unit access controller 2205 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 2206 or to transfer weight data to the weight memory 2202 or to transfer input data to the input memory 2201.
  • the vector calculation unit 2207 includes multiple operation processing units, which further process the output of the operation circuit when necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as batch normalization, pixel-level summation, upsampling of feature planes, etc.
  • the vector calculation unit 2207 can store the processed output vector to the unified memory 2206.
  • the vector calculation unit 2207 can apply a linear function and/or a nonlinear function to the output of the operation circuit 2203, such as linear interpolation of the feature plane extracted by the convolution layer, and then, for example, a vector of accumulated values to generate an activation value.
  • the vector calculation unit 2207 generates a normalized value, a pixel-level summed value, or both.
  • the processed output vector can be used as an activation input to the operation circuit 2203, for example, for use in a subsequent layer in a neural network.
  • An instruction fetch buffer 2209 connected to the controller 2204 is used to store instructions used by the controller 2204;
  • Unified memory 2206, input memory 2201, weight memory 2202 and instruction fetch memory 2209 are all On-Chip memories. External memories are private to the NPU hardware architecture.
  • each layer in the recurrent neural network can be performed by the operation circuit 2203 or the vector calculation unit 2207.
  • the processor mentioned in any of the above places may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the programs of the methods of FIG. 3-FIG 5 .
  • the device embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • the technical solution of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium, such as a computer floppy disk, a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, etc., including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in each embodiment of the present application.
  • a computer device which can be a personal computer, a server, or a network device, etc.
  • all or part of the embodiments may be implemented by software, hardware, firmware or any combination thereof.
  • all or part of the embodiments may be implemented in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from one website, computer, server or data center to another website, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.
  • wired e.g., coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless e.g., infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a server or data center that includes one or more available media integrated.
  • the available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid state drive (SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

本申请提供一种应用于人工智能领域的生成模型训练方法、数据转换方法以及装置,用于基于更稳定的训练得到输出效果更好的生成模型,使用该生成模型进行用户所需的数据转换。包括:首先,将噪声集合中的数据作为生成模型的输入,输出至少一个生成样本,生成模型用于对输入的数据进行数据转换;随后,将至少一个生成样本作为第一扩散模型的输入,输出至少一个第一扩散得分值,即使用第一扩散模型对生成模型的输出效果进行评分;随后,根据至少一个第一扩散得分值以及第二扩散模型输出的至少一个第二扩散得分值对生成模型进行更新,得到更新后的生成模型,第二扩散模型为使用真实样本集合训练得到。

Description

一种生成模型训练方法、数据转换方法以及装置
本申请要求于2022年11月26日提交中国国家知识产权局、申请号为202211497412.1、申请名称为“一种生成模型训练方法、数据转换方法以及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种生成模型训练方法、数据转换方法以及装置。
背景技术
生成模型具有广泛的应用场景和极大的价值,可以用于实现多种任务,如高分辨率的图像生成、文字转换图像、文字转语音或语音生成等。以生成对抗网络(generative adversarial network,GAN)为代表的隐式生成模型引入判别网络与生成网络相互博弈,利用对抗训练的方式学习噪声到数据分布的变换,然而,在对抗训练的过程中优化过程不稳定,训练容易崩溃。因此,如何实现更稳定的生成模型训练,成为亟待解决的问题。
发明内容
本申请提供一种应用于人工智能领域的生成模型训练方法、数据转换方法以及装置,用于基于更稳定的训练得到输出效果更好的生成模型,使用该生成模型进行用户所需的数据转换。
有鉴于此,第一方面,本申请提供一种生成模型训练方法,包括:首先,将噪声集合中的数据作为生成模型的输入,输出至少一个生成样本,生成模型用于对输入的数据进行数据转换,该噪声集合中可以包括多帧噪声数据,该多帧噪声数据可以包括接收的噪声数据或者随机生成的数据;随后,将至少一个生成样本作为第一扩散模型的输入,输出至少一个第一扩散得分值,第一扩散模型用于分别对每个生成样本进行至少一次扩散并对扩散后的数据进行评分,相当于使用第一扩散模型对生成模型的输出效果进行评分;随后,根据至少一个第一扩散得分值以及第二扩散模型输出的至少一个第二扩散得分值对生成模型进行更新,得到更新后的生成模型,第二扩散模型为使用真实样本集合训练得到,该真实样本集合中的每个样本包括对应的标签(label),第二扩散模型用于对输入的数据进行至少一次扩散并对扩散后的数据进行评分,第一扩散模型和第二扩散模型的参数不相同,更新后的生成模型用于从用户在计算设备中输入的数据中提取特征并根据提取到的为特征生成对应的数据。
可以设置一个使用待训练生成模型的输出数据来进行更新的第一扩散模型,第二扩散模型为使用真实样本集合进行训练得到的模型。基于第一扩散模型和第二扩散模型的输出之间的差值来计算损失值并更新生成模型,相当于将第二扩散模型作为教师模型,第一扩散模型作为学生模型进行知识蒸馏,因此无需进行对抗训练,可以实现更稳定更高效的训练。因此,本申请实施方式中,采用两个扩散模型分别进行扩散,如一个扩散真实样本,一个扩散生成样本,从而降低真实样本和生成样本之间的分布距离,以便于计算真实样本和生成样本之间的损失值,从而反向更新生成模型,无需对抗训练,提高优化模型的稳定性。
在一种可能的实施方式中,前述的根据至少一个第一扩散得分值以及第二扩散模型输出的至少一个第二扩散得分值对生成模型进行更新,得到更新后的生成模型,可以包括:首先,通过至少一个第一扩散得分值对第一扩散模型进行更新,得到更新后的第一扩散模型;随后将至少一个生成样本作为更新后的第一扩散模型的输入,输出至少一个第三扩散得分值,该至少一个第二扩散得分值和至少一个第三扩散得分值一一对应,可以理解为对于每次扩散,两个扩散模型的扩散尺度相同,因此通过得分函数输出的扩散的分值也在各个扩散尺度一一对应;获取第二扩散模型输出的至少一个第二扩散得分值;随后根据至少一个第三扩散得分值中每个第三扩散得分值和对应的第二扩散得分值之间的损失值,更新生成模型,得到更新后的生成模型。
本申请实施方式中,可以先更新第一扩散模型,基于扩散模型再次输出的扩散得分值来计算损失值,从而可以使用更新后的扩散模型输出的更准确的得分值来更新生成模型,可以实现生成模型的稳定更新。
在一种可能的实施方式中,前述的本申请提供的方法还可以包括:首先,将真实样本集合中的样本作为第二扩散模型的输入,输出至少一个第四扩散得分值;随后根据至少一个第四扩散得分值更新第二扩散模型,得到更新后的第二扩散模型;可以再次将真实样本集合中的样本作为更新后的第二扩散模型的输入,输出至少一个第二扩散得分值。
本申请实施方式中,在训练生成模型的迭代过程中,可以在每次迭代中同步更新第二扩散模型,从而可以使用基于真实数据训练得到的扩散模型的输出结果来指导生成模型的更新,可以实现更稳定的训练效果。
在一种可能的实施方式中,第二扩散模型可以包括真实样本集合预训练后的模型,前述的获取第二扩散模型输出的至少一个第二扩散得分值,可以包括:从第二扩散模型中至少一个第二扩散得分值。
本申请实施方式中,可以对第二扩散模型进行预训练,从而使在生成模型的训练过程中,第二扩散模型可以直接输出更优的扩散得分值,相当于使用预训练的第二扩散模型作为教师模型,生成模型作为学生模型来进行蒸馏,得到输出效果更好的生成模型。且在训练生成模型的过程中,无需额外训练第二扩散模型,可以降低训练开销,提高训练效率。
在一种可能的实施方式中,第一扩散模型用于:按照预设步长对第一生成样本进行加噪,得到至少一个第一噪声样本;将至少一个第一噪声样本作为第一得分函数的输入,输出至少一个第一扩散得分值。因此,第一扩散模型通过加噪,使噪声样本与真实样本的距离更接近,从而便于后续的损失值计算,从而实现对生成模型的稳定更新。
在一种可能的实施方式中,当真实样本集合中的样本作为第二扩散模型的输入时,第二扩散模型用于:按照预设步长对真实样本集合中样本进行加噪,得到至少一个第二噪声样本;将至少一个第二噪声样本作为第二得分函数的输入,得到至少一个第二扩散得分值。
因此,第一扩散模型和第二扩散模型可以使用相同的扩散步长进行扩散,从而可以相同扩散尺度的样本进行评分。
在一种可能的实施方式中,生成模型用于执行以下一种或者多种任务:将输入的文本转换为图像、将输入的语音转换为图像、对输入的图像进行数据补全、将输入的文本转换为语音或者转换输入的图像的分辨率。因此,本申请提供的生成模型可以应用于多种场景,具有多样性,泛化能力强。
第二方面,本申请提供一种数据转换方法,包括:
接收输入数据,输入数据包括用户输入的数据;
将输入数据作为生成模型的输入,得到输出结果,生成模型用于从输入数据提取特征,并使用提取到的特征进行建模得到输出结果;其中,该生成模型用于从输入数据中提取特征,根据提取到的特征生成预设类型的数据,生成模型为根据第一扩散模型和第二扩散模型的输出结果进行训练得到,第一扩散模型为使用训练完成前的生成模型的输出样本进行训练得到,第二扩散模型为使用真实样本集合训练得到,真实样本集合中的每个样本包括对应的标签,第二扩散模型用于对输入的数据进行至少一次扩散并对扩散后的数据进行评分,第一扩散模型和第二扩散模型的参数不相同。
本申请实施方式中,在训练生成模型的过程中,可以设置一个使用待训练生成模型的输出数据来进行更新的第一扩散模型,第二扩散模型为使用真实样本集合进行训练得到的模型。基于第一扩散模型和第二扩散模型的输出之间的差值来计算损失值并更新生成模型,相当于将第二扩散模型作为教师模型,第一扩散模型作为学生模型进行知识蒸馏,因此无需进行对抗训练,可以实现更稳定更高效的训练。当第一扩散模型和第二扩散模型针对输入的样本分别进行了多次扩散时,本申请提供的方法可以在多个扩散尺度上匹配真实数据与生成数据分布的之间的得分函数,实现隐式生成模型的高效非对抗训练。
在一种可能的实施方式中,生成模型用于执行以下一种或者多种任务:将输入的文本转换为图像、将输入的语音转换为图像、对输入的图像进行数据补全、将输入的文本转换为语音或者转换输入的图像的分辨率。因此,本申请提供的生成模型可以应用于多种场景,具有多样性,泛化能力强。
第三方面,本申请提供一种生成模型训练装置,包括:
生成模块,用于将噪声集合中的数据作为生成模型的输入,输出至少一个生成样本,生成模型用于对输入的数据进行数据转换,噪声集合中包括多帧噪声数据;
第一扩散模块,用于将至少一个生成样本作为第一扩散模型的输入,输出至少一个第一扩散得分值,第一扩散模型用于分别对每个生成样本进行至少一次扩散并对扩散后的数据进行评分;
训练模块,用于根据至少一个第一扩散得分值以及第二扩散模型输出的至少一个第二扩散得分值对生成模型进行更新,得到更新后的生成模型,第二扩散模型为使用真实样本集合训练得到,真实样本集合中的每个样本包括对应的标签,第二扩散模型用于对输入的数据进行至少一次扩散并对扩散后的数据进行评分,第一扩散模型和第二扩散模型的参数不相同,更新后的生成模型用于从用户在计算设备中输入的数据中提取特征并根据提取到的为特征生成对应的数据。
本申请提供的第三方面以及第三方面任一可选实施方式所实现的效果可以参阅第一方面或第一方面任一可选实施方式的效果,此处不再赘述。
在一种可能的实施方式中,训练模块,具体用于:
通过至少一个第一扩散得分值对第一扩散模型进行更新,得到更新后的第一扩散模型;将至少一个生成样本作为更新后的第一扩散模型的输入,输出至少一个第三扩散得分值,至少一个第二扩散得分值和至少一个第三扩散得分值一一对应;获取第二扩散模型输出的至少一个第二扩散得分值根据至少一个第三扩散得分值中每个第三扩散得分值和对应的第二扩散得分值之间的损失值,更新生成模型,得到更新后的生成模型。
在一种可能的实施方式中,装置还包括:第二扩散模块,用于将真实样本集合中的样本作为第二扩散模型的输入,输出至少一个第四扩散得分值;
训练模块,还用于根据至少一个第四扩散得分值更新第二扩散模型,得到更新后的第二扩散模型;将真实样本集合中的样本作为更新后的第二扩散模型的输入,输出至少一个第二扩散得分值。
在一种可能的实施方式中,第二扩散模型为真实样本集合预训练后的模型,训练模块,还用于从第二扩散模型中至少一个第二扩散得分值。
在一种可能的实施方式中,第一扩散模型用于:按照预设步长对第一生成样本进行加噪,得到至少一个第一噪声样本;将至少一个第一噪声样本作为第一得分函数的输入,输出至少一个第一扩散得分值。
在一种可能的实施方式中,当真实样本集合中的样本作为第二扩散模型的输入时,第二扩散模型用于:按照预设步长对真实样本集合中样本进行加噪,得到至少一个第二噪声样本;将至少一个第二噪声样本作为第二得分函数的输入,得到至少一个第二扩散得分值。
在一种可能的实施方式中,生成模型用于执行以下一种或者多种任务:将输入的文本转换为图像、将输入的语音转换为图像、对输入的图像进行数据补全、将输入的文本转换为语音或者转换输入的图像的分辨率。
第四方面,本申请提供一种数据转换装置,包括:
收发模块,用于接收输入数据,输入数据包括用户输入的数据;
生成模块,用于将输入数据作为生成模型的输入,得到输出结果,生成模型用于从输入数据提取特征,并使用提取到的特征进行建模得到输出结果;
其中,生成模型用于从输入数据中提取特征,根据提取到的特征生成预设类型的数据,生成模型为根据第一扩散模型和第二扩散模型的输出结果进行训练得到,第一扩散模型为使用训练完成前的生成模型的输出样本进行训练得到,第二扩散模型为使用真实样本集合训练得到,真实样本集合中的每个样本包括对应的标签,第二扩散模型用于对输入的数据进行至少一次扩散并对扩散后的数据进行评分,第一扩散模型和第二扩散模型的参数不相同。
在一种可能的实施方式中,生成模型用于执行以下一种或者多种任务:将输入的文本转换为图像、将输入的语音转换为图像、对输入的图像进行数据补全、将输入的文本转换为语音或者转换输入的图像的分辨率。
第五方面,本申请实施例提供一种生成模型训练装置,该生成模型训练装置具有实现上述第一方面的方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
第六方面,本申请实施例提供一种数据转换装置,该数据转换装置具有实现上述第一方面的方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
第七方面,本申请实施例提供一种生成模型训练装置,包括:处理器和存储器,其中,处理器和存储器通过线路互联,处理器调用存储器中的程序代码用于执行上述第一方面任一项所示的用于生成模型训练方法中与处理相关的功能。可选地,该生成模型训练装置可以是芯片。
第八方面,本申请实施例提供一种数据转换装置,包括:处理器和存储器,其中,处理器和存储器通过线路互联,处理器调用存储器中的程序代码用于执行上述第二方面任一项所示的用于数据转换方法中与处理相关的功能。可选地,该数据转换装置可以是芯片。
第九方面,本申请实施例提供了一种生成模型训练装置,该生成模型训练装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行如上述第一方面或第一方面任一可选实施方式中与处理相关的功能。
第十方面,本申请实施例提供了一种数据转换装置,该数据转换装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行如上述第二方面或第二方面任一可选实施方式中与处理相关的功能。
第十一方面,本申请实施例提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面中任一可选实施方式中的方法。
第十二方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面中任一可选实施方式中的方法。
附图说明
图1为本申请提供的一种系统架构示意图;
图2为本申请提供的另一种系统架构示意图;
图3为本申请提供的另一种系统架构示意图;
图4为本申请提供的一种生成模型训练方法的流程示意图;
图5为本申请提供的另一种生成模型训练方法的流程示意图;
图6为本申请提供的另一种生成模型训练方法的流程示意图;
图7为本申请提供的一种扩散效果示意图;
图8为本申请提供的另一种生成模型训练方法的流程示意图;
图9为本申请提供的另一种生成模型训练方法的流程示意图;
图10为本申请提供的一种数据转换方法的流程示意图;
图11为本申请提供的一种生成效果示意图;
图12为本申请提供的一种训练效果示意图;
图13为本申请提供的另一种生成效果示意图;
图14为本申请提供的另一种生成效果示意图;
图15为本申请提供的另一种生成效果示意图;
图16为本申请提供的另一种生成效果示意图;
图17为本申请提供的另一种生成效果示意图;
图18为本申请提供的一种生成模型训练装置的结构示意图;
图19为本申请提供的一种数据转换装置的结构示意图;
图20为本申请提供的另一种生成模型训练装置的结构示意图;
图21为本申请提供的另一种数据转换装置的结构示意图;
图22为本申请提供的一种芯片的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
首先对人工智能系统总体工作流程进行描述,下面从“智能信息链”和“IT价值链”两个维度对 上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片,如中央处理器(central processing unit,CPU)、网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、智慧城市等。
本申请提供的方法涉及神经网络的相关概念,为便于理解,下面首先对涉及到的神经网络的相关概念进行介绍。
(1)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以xs为输入的运算单元,该运算单元的输出可以为:
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于对神经网络中获取到的特征进行非线性变换,将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元可以共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取特征的方式与位置无关。卷积核可以以随机大小的矩阵的形式化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(3)生成模型
生成模型是指能够随机生成观测数据的模型,尤其是在给定某些隐含参数的条件下。它给观测值和标注数据序列指定一个联合概率分布。在机器学习中,生成模型可以用来直接对数据建模(例如根据某个变量的概率密度函数进行数据采样),也可以用来建立变量间的条件概率分布。条件概率分布可以由生成模型根据贝叶斯定理形成。
(4)隐式生成模型
隐式生成模型(implicit generative model)是一个由神经网络参数化的从噪声到真实数据的变换。训练好生成模型后,输入随机噪声,可以输出高质量样本。该模型之所以称作隐式模型是因为模型无法得到数据的概率密度函数估计,而只能从中采样。本申请以下所提及的生成模型,即为隐士生成模型。
(5)扩散式生成模型
扩散式生成模型(generative diffusion model)是一类概率生成式模型。该模型通过一个时间相关的得分函数模型s(x,t)(通常为深度神经网络)来拟合数据分布沿着某类特定扩散过程所演化的概率分布的得分函数,从而学习到数据分布的特征。扩散式生成模型通过模拟一个反向随机微分方程的解来实现数据的生成。
(6)得分函数
得分函数(score function):指概率分布的对数密度函数对自变量的梯度,是对概率分布的一种描述,其数学表示为其中p(x)指概率密度函数,s(x)指得分函数。
(7)生成对抗网络
生成对抗网络(generative adversarial network,GAN)训练:是一种主流的隐式生成模型训练范式,对于给定的隐式生成网络,引入一个判别网络与生成网络相互博弈,从优化角度上是一个典型的极大极小优化问题(minimax optimization),博弈的平衡态可以对应各种分布之间的距离,例如最优传输距离,詹森香农散度。极大极小优化问题非常不稳定,很难达到全局最优,GAN的优化代价大,生成多样性有待提升。
(8)生成模型采样
指从训练好的生成模型中产生新样本,对于GAN,该过程是把低维随机噪声输入隐式生成网络,一次网络前传输出样本;对于DPM,该过程是从同维度随机噪声出发,经过与扩散加噪过程对应的去噪过程,需要一般数千次网络前传才能生成样本。
(9)概率扩散过程(diffusion process)
即数据按照一定规律随时间变化的过程。通常,在概率论和统计学中,扩散过程是随机微分方程的解。这是一个连续时间的马尔可夫过程,是连续的样本路径。布朗运动、反射布朗运动和奥恩斯坦-乌伦贝克过程即扩散过程。
(10)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。该损失函数通常可以包括误差平方均方、交叉熵、对数、指数等损失函数。例如,可以使用误差均方作为损失函数,定义为具体可以根据实际应用场景选择具体的损失函数。
(11)反向传播算法
卷积神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的网络模型中的参数的大小,使得模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的模型中的参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的模型参数,例如,权重矩阵。
(12)梯度:损失函数关于参数的导数向量。
本申请实施例提供的方法可以在服务器上被执行,还可以在终端设备上被执行。其中,该服务器可以包括本地服务器、集成服务器或者分布式服务器等。该终端设备可以包括移动电话、平板个人电脑(tablet personal computer,TPC)、媒体播放器、智能电视、笔记本电脑(laptop computer,LC)、个人数字助理(personal digital assistant,PDA)、个人计算机(personal computer,PC)、照相机、摄像机、智能手表、可穿戴式设备(wearable device,WD)或者自动驾驶的车辆等,本申请实施例对此不作限定。
下面介绍本申请实施例提供的系统架构。
参见图1,本申请实施例提供了一种系统架构200。如系统架构200所示,数据采集设备260可以用于采集训练数据。在数据采集设备260采集到训练数据之后,将这些训练数据存入数据库230,训练设备220基于数据库230中维护的训练数据训练得到生成模型201。
下面对训练设备220基于训练数据得到生成模型201进行描述。示例性地,训练设备220基于属性异质性图构建生成模型,并通过对比学习来更新生成模型的参数,从而完成生成模型201的训练。具体描述详见后文中的训练方法。
本申请实施例中的生成模型201具体可以为神经网络。需要说明的是,在实际的应用中,数据库230中维护的训练数据不一定都来自于数据采集设备260的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备220也不一定完全基于数据库230维护的训练数据进行生成模型201的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备220训练得到的生成模型201可以应用于不同的系统或设备中,如应用于图1所示的执行设备210,所述执行设备210可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR),车载终端,电视等,还可以是服务器或者云端等。在图1中,执行设备210配置有收发器212,该收发器可以包括输入/输出(input/output,I/O)接口或者其他无线或者有线的通信接口等,用于与外部设备进行数据交互,以I/O接口为例,用户可以通过客户设备240向I/O接口输入数据。
在执行设备210对输入数据进行预处理,或者在执行设备210的计算模块211执行计算等相关的处理过程中,执行设备210可以调用数据存储系统250中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统250中。
最后,I/O接口将处理结果返回给客户设备240,从而提供给用户。
值得说明的是,训练设备220可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的生成模型201,该相应的生成模型201即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
在附图1中所示情况下,用户可以手动给定输入数据,该手动给定可以通过收发器212提供的界面进行操作。另一种情况下,客户设备240可以自动地向收发器212发送输入数据,如果要求客户设备240自动发送输入数据需要获得用户的授权,则用户可以在客户设备240中设置相应权限。用户可以在客户设备240查看执行设备210输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备240也可以作为数据采集端,采集如图所示输入收发器212的输入数据及输出收发器212的输出结果作为新的样本数据,并存入数据库230。当然,也可以不经过客户设备240进行采集,而是由收发器212直接将如图所示输入收发器212的输入数据及输出收发器212的输出结果,作为新的样本数据存入数据库230。
值得注意的是,附图1仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图1中,数据存储系统250相对执行设备210是外部存储器,在其它情况下,也可以将数据存储系统250置于执行设备210中。
示例性地,本申请提供的生成模型构建方法的应用的系统架构可以如图2所示。在该系统架构300中,服务器集群310由一个或多个服务器实现,可选的,与其它计算设备配合,例如:数据存储、路由器、负载均衡器等设备。服务器集群310可以使用数据存储系统250中的数据,或者调用数据存储系统250中的程序代码实现本申请提供的生成模型构建方法的步骤。
用户可以操作各自的用户设备(例如本地设备301和本地设备302)与服务器集群310进行交互。每个本地设备可以表示任何计算设备,例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与服务器集群310进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。具体地,该通信网络可以包括无线网络、有线网络或者无线网络与有线网络的组合等。该无线网络包括但不限于:第五代移动通信技术(5th-Generation,5G)系统,长期演进(long term evolution,LTE)系统、全球移动通信系统(global system for mobile communication,GSM)或码分多址(code division multiple access,CDMA)网络、宽带码分多址(wideband code division multiple access,WCDMA)网络、无线保真(wireless fidelity,WiFi)、蓝牙(bluetooth)、紫蜂协议(Zigbee)、射频识别技术(radio frequency identification,RFID)、远程(Long Range,Lora)无线通信、近距离无线通信(near field communication,NFC)中的任意一种或多种的组合。该有线网络可以包括光纤通信网络或同轴电缆组成的网络等。
在另一种实现中,执行设备210的一个方面或多个方面可以由每个本地设备实现,例如,本地设备301可以为执行设备210提供本地数据或反馈计算结果。
需要注意的,执行设备210的所有功能也可以由本地设备实现。例如,本地设备301实现执行设备210的功能并为自己的用户提供服务,或者为本地设备302的用户提供服务。
更具体地,本申请提供的生成模型的训练方法可以部署于本地服务器,也可以部署于云端,还可以部署于本地终端。训练得到的生成模型可以部署于终端、本地服务器或者云端服务器等。例如,本申请提供的训练阶段可以部署于服务器中,由服务器对生成模型进行训练得到训练后的生成模型。训练得到的模型可以部署于云平台,用户可以通过在客户端进行输入操作,即可通过云平台得到所需数据。
云领域的AI服务和产品既体现了云服务的按需使用和购买的特点,也兼具AI技术的抽象、多样、应用广泛的特点。云领域的AI服务的主流类型有两类,一类是平台即服务(Platform-as-a-Service,PaaS)类型的AI基础开发平台服务,另一类是软件即服务(Software-as-a-Service,SaaS)类型的AI应用云服务。
对于第一种类型的AI基础开发平台服务,公有云服务提供商凭借其充足的底层资源的支撑以及上层AI算法能力,向用户提供AI基础开发平台。该AI基础开发平台中内置的AI开发框架、各种AI算法可供用户在AI基础开发平台上快速构建和开发符合个性化需求的AI模型或AI应用。
对于第二种类型的AI应用云服务,公有云服务提供商通过云平台提供通用的AI应用云服务,使用户在各种不同的应用场景零门槛地使用AI能力。
例如,公有云AI基础开发平台是云平台中一项PaaS云服务,是基于公有云服务提供商所拥有的大量基础资源和软件能力对用户(也称为:租户、AI开发者等)提供的辅助进行AI模型的构建、训练、部署以及AI应用的开发和部署的软件平台。
示例性地,本申请提供的生成模型训练方法可以部署由服务器来执行,训练得到的生成模型可以部署于云平台,可以以应用程序接口(API)的形式被用户付费调用。具体例如,本申请提供的方法可以作为为用户提供服务部署于云平台中,并为用户提供可调用该服务的API,用户可以通过该API调用该服务,输入需生成的数据的相关信息,如输入需转换为图像的文本、需转换为语音的文本或者需提高分辨率的图像等,通过该服务为用户生成所需数据,提高用户体验。
如图3所示,用户与AI基础开发平台的交互形态主要包括:用户通过客户端网页登录云平台,在云平台中选择并购买AI基础开发平台的云服务,用户即可以基于AI基础开发平台提供的功能进行全流程的AI服务。
用户在AI基础开发平台上开发和训练AI模型时,是基于云服务提供商的数据中心中的基础资源(主要是计算资源,例如CPU、GPU、NPU等)进行的。
通常,支撑AI平台中任何一个流程的基础资源可能是分布于不同的物理设备上的,也即实际执行一个流程的硬件设备通常是同一数据中心中的服务器集群,或者是分布在不同数据中心的服务器集群。
这些数据中心可以是云服务提供商的中心云数据中心、也可能是云服务提供商向用户提供的边缘数据中心。例如:在公有云与私有云结合的场景中,利用公有云中的资源运行AI基础开发平台中提供的模型训练和模型管理的功能,利用私有云中的资源运行AI基础开发平台中提供的数据存储和数据预处理的功能,这样可以为用户的数据提供更强的安全性。这种场景下,公有云的资源可以是来自中心云数据中心,私有云的资源可以是来自边缘数据中心。
可以理解为,AI平台可以独立地部署在云环境的数据中心中的服务器或虚拟机上,AI平台也可以分布式地部署在数据中心中的多台服务器上、或者分布式地部署在数据中心中的多台虚拟机上。
在另一种实施例中,本申请提供的AI平台还可以分布式地部署在不同的环境中。本申请提供的AI平台可以在逻辑上分成多个部分,每个部分具有不同的功能。例如,AI平台100中的一部分可以部署在边缘环境中的计算设备中(也称边缘计算设备),另一部分可以部署在云环境中的设备中。边缘环境为在地理位置上距离用户的终端计算设备较近的环境,边缘环境包括边缘计算设备,例如:边缘服务器、拥有计算能力的边缘小站等。部署在不同环境或设备的AI平台100的各个部分协同实现为用户提供训练AI模型等功能。
生成模型在多种场景中应用,GAN和扩散式生成模型是常用的生成模型,如可以用于进行例如高分辨率的图像生成、文字转换图像、文字转语音和语音生成等。
以一些常用的生成模型训练方式为例。
以GAN为代表的隐式生成模型引入判别网络与生成网络相互博弈,利用对抗训练的方式学习噪声到数据分布的变换,优点是生成质量高,模型相对较小,生成速度快,部署成本低;缺点随着对抗训练的引入,优化过程不稳定,训练容易崩溃,对超参数敏感,生成多样性不足。如SOTA隐式生成模型设计了更大的隐式生成网络,使用传统对抗训练的方法进行训练。然而对于超参选择非常敏感,优化困难,需要复杂的正则化技术来稳定训练。
扩散生成模型依赖概率扩散过程,通过加噪声拉近两个分布,降低学习难度。加噪声的过程让数据不断失去原有的信息,直到最终变为白噪声。当生成数据时,扩散概率模型则是去学习加噪过程的逆向过程,从而得到一个去噪过程。去噪过程让数据逐渐恢复信息,直到最终恢复为正常干净的数据。扩散生成模型的优点是训练稳定,可以精确学到数据分布,在众多生成任务上取得了惊艳的效果。其缺点也十分明显,就是计算开销大,最优模型通常需要1GB+的内存部署;模型采样时也需要N步迭代。在实际中,N的数量级通常为几百到几千,当N=1000时,扩散概率模型的开销是生成对抗网络的1000倍。
例如,SOTA扩散生成模型通过一个扩散过程对数据进行扩散,利用得分函数网络在多个扩散程度上拟合扩散数据分布的得分函数。该方案最终输出一个得分函数网络。数据生成的过程通过反复迭代运算得分函数网络实现。然而,模型(得分函数网络)参数多,一般需要1GB+存储空间,生成速度慢,生成单张图片需要上千步大网络前传运算。
又例如,隐式模型加速扩散模型尝试将隐式生成模型作为部件引入扩散模型结构中。通过一个条件隐式生成模型建模扩散概率模型中的反向过程,提升扩散概率模型生成效率。该方案通过一个条件生成网络来建模概率扩散模型中的长距离扩散过程,因此可以部分缓解概率扩散模型运行时需要反复迭代这个缺点。但是条件生成网络依然依靠传统的对抗训练方式进行训练,训练不稳定,容易训练失败。
还例如,扩散过程辅助对抗训练(Diffusion-GAN)试将扩散过程引入隐式生成模型训练,相比传统的GAN方法生成效果更好。该方法通过一个随机过程将数据分布和生成分布做扩散,在扩散后的数据上进行对抗训练。然而,这种方式依然采用对抗训练,依然存在训练不稳定,可能出现训练失败的情况。
因此,本申请提供一种生成模型训练方法,可以得到一个既生成效果好、多样性高、训练稳定,又能采样高效、部署方便的生成模型。可以理解为,本申请提出了一种基于扩散过程的非对抗隐式生成模型训练框架,可以客服已有方案的缺陷,具有较大的价值。
下面结合前述的架构,对本申请提供的生成模型训练方法进行介绍。
参阅图4,本申请提供的一种生成模型训练方法,如下所述。
401、获取噪声集合。
其中,该噪声集合中可以包括多帧噪声数据,该多帧噪声数据可以包括随机生成的数据,也可以包括接收到的其他设备发送的数据,还可以包括用户随机输入的数据等。
通常,该噪声集合中的数据包括待训练的生成模型的输入数据的类型。如生成模型的输入数据类型可以包括文本、图像、语音或者其他数据类型等,相应地噪声集合中的噪声数据的数据类型也可以包括文本、图像、语音或者其他数据类型等。
如生成模型的输入数据类型为文本,则噪声数据集合中的噪声数据的数据类型也为文本;若生成模型的输入数据类型为图像,则噪声数据集合中的噪声数据的类型也可以包括图像;若生成模型的输入数据包括语音,则噪声数据集合中的噪声数据的类型也可以包括语音。
通常,可以对生成模型进行多次迭代更新,下面示例性地,以其中一次迭代更新为例进行介绍,即以下步骤402-步骤405可以迭代执行。
402、将噪声集合中的数据作为生成模型的输入,输出至少一个生成样本。
生成模型中可以设置针对图像,语音,文本的特征提取器,在获取到噪声集合后,即可将噪声集合中的噪声数据作为生成模型的输入,通过生成模型从输入的噪声数据中提取特征,并根据特征生成对应的数据,输出至少一个生成样本。当然,也可以通过其他特征提取模型从输入数据中提取到特征后输入至生成模型以完成生成步骤。
通常,该生成模型可以执行的任务可以包括将输入的文本转换为图像、将输入的语音转换为图像、对输入的图像进行数据补全、将输入的文本转换为语音或者转换输入的图像的分辨率中的一种或者多种。
例如,若生成模型用于将输入的文本转换为图像,噪声集合中的数据包括多帧文本数据,则可以通过生成模型将文本转换为表征向量,并从表征向量中提取图像特征,根据该图像特征生成对应的图像。如若输入的文本包括“动物、猫”,则可以将文本转换为神经网络可处理的数据类型,即转换为嵌入表征(embedding),并从该嵌入表征中提取图像特征,并生成包括了猫的图像。
又例如,若生成模型用于将输入的图像进行补全,噪声集合中的数据可以包括多帧图像。可以将噪声集合中的图像作为生成模型的输入,通过该生成模型从图像中提取特征,并根据该特征推理需补全的像素点的像素值,得到补全后的图像。
又例如,若生成模型用于将输入的语音转换为图像,噪声集合中可以包括多帧语音数据。将语音数据作为生成模型的输入,通过生成模型识别语音数据的语义特征,并根据提取到的特征进行处理,生成对应的图像。
403、将至少一个生成样本作为第一扩散模型的输入,输出至少一个第一扩散得分值。
其中,在训练生成模型的过程中,需引入至少两个扩散模型,此处为了便于区分,成为第一扩散模型和第二扩散模型。
在得到生成模型输出的至少一个生成样本后,可以将该至少一个生成样本作为第一扩散模型的输入,该第一扩散模型可以用于对输入的生成样本进行至少一次扩散,并对每次扩散后的数据通过得分函数输出得分,得到至少一个扩散的分值,为便于区分称为第一扩散得分值。该第一扩散得分值可以包括在进行扩散时所使用的对数概率密度函数对自变量的梯度,可以理解为表示概率分布。
404、获取第二扩散模型输出的至少一个第二扩散得分值。
其中,第一扩散模型和第二扩散模型的区别通常在于模型参数不相同。如在分别训练第一扩散模型和第二扩散模型的过程中,所使用的输入数据不相同,所实现的训练效果不相同,因此第一扩散模型和第二扩散模型的参数不相同。
可选地,该第二扩散模型可以是使用真实样本集合预训练后的模型,也可以是在训练生成模型的过程中将真实样本集合作为输入同步训练的模型。
应理解,本申请所提及的真实样本集合,可以理解为样本对。例如,若生成模型用于基于文本生成图像,则真实样本集合中可以包括多对样本对,每对样本对可以包括文本以及文本对应的一帧或多帧图像;若生成模型用于基于语音生成图像,则每对样本对可以包括语音以及对应的一帧或多帧图像;若生成模型用于提高图像分辨率,则每对样本对可以包括低分辨率图像以及高分辨率图像等,以此类推。
若该第二扩散模型为预训练得到的模型,则第一扩散模型训练时的扩散步长可以与该第二扩散模型训练时的步长相同,从而可以从该第二扩散模型中直接提取与第一扩散模型中各个扩散步长分别尺度相同的第二扩散得分值。
若该第二扩散模型是在训练生成模型的过程中同步训练,则可以将真实样本集合中的样本作为该第二扩散模型的输入,输出至少一个第二扩散得分值。其中第一扩散模型与第二扩散模型所使用的扩散步长可以相同。
通常,第一扩散模型和第二扩散模型分别对输入的样本进行扩散时,可以使用相同的步长进行扩散,在每次扩散过程中,加噪后得到的数据的尺度通常也相同,以便于后续可以在相同尺度计算损失值,提高训练稳定性,提高模型收敛效率。
例如,以任意一个生成样本的扩散过程为例,为便于区分称为第一生成样本。第一扩散模型可以按照预设步长对该第一生成样本进行加噪,得到至少一个第一噪声样本,随后通过第一扩散模型中的第一得分函数输出对应的至少一个第一扩散的分值。相应地,当将真实样本集合中的样本作为第二扩散模型的输入时,第二扩散模型可以按照预设步长对该样本进行加噪,得到至少一个第二噪声样本;随后将该至少一个第二噪声样本作为第二得分函数的输入,得到至少一个第二扩散得分值。
405、根据至少一个第一扩散得分值以及至少一个第二扩散得分值对生成模型进行更新,得到更新后的生成模型。
其中,可以通过至少一个第一扩散得分值对第一扩散模型进行更新,得到更新后的第一扩散模型。随后可以将生成样本作为更新后的第一扩散模型的输入,输出至少一个新的扩散得分值,为便于区分称为第三扩散得分值。该至少一个第三扩散得分值和至少一个第二扩散得分值一一对应,可以根据至少一个第三扩散得分值中的每个第三扩散得分值以及对应的第二扩散值之间的差值计算损失值,并根据该损失值对生成模型进行反向更新,得到更新后的生成模型。
其中,可以进行多次迭代训练,即执行多次步骤402-步骤405,直至模型收敛。模型的收敛条件可以包括但不限于以下一种或者多种:迭代次数达到预设次数、迭代时长达到预设时长、第一扩散模型和第二扩散模型输出的差值在预设范围内、损失函数的变化值不大于预设变化值等,具体可以根据实际应用场景选择。
本申请实施方式中,可以设置一个使用待训练生成模型的输出数据来进行更新的第一扩散模型,第二扩散模型为使用真实样本集合进行训练得到的模型。基于第一扩散模型和第二扩散模型的输出之间的差值来计算损失值并更新生成模型,相当于将第二扩散模型作为教师模型,第一扩散模型作为学生模型进行知识蒸馏,因此无需进行对抗训练,可以实现更稳定更高效的训练。当第一扩散模型和第二扩散模型针对输入的样本分别进行了多次扩散时,本申请提供的方法可以在多个扩散尺度上匹配真实数据与生成数据分布的之间的得分函数,实现隐式生成模型的高效非对抗训练。
在一种可能的实施方式中,若在训练生成模型的过程中更新第二扩散模型,则在更新生成模型之前,还可以对第二扩散模型进行更新。具体地,可以将真实样本集合中的样本作为第二扩散模型的输入,输出至少一个第四扩散得分值;根据所述至少一个第四扩散得分值更新所述第二扩散模型,得到更新后的第二扩散模型。随后将真实样本集合中的样本作为更新后的第二扩散模型的输入,输出至少一个第二扩散得分值。相当于在训练生成网络的过程中,可以通过第二扩散模型来实时估计得分函数,利用学习到的得分函数更新生成网络,实现对生成网络的稳定更新。
前述对本申请提供的生成模型训练方法的流程进行了介绍,为便于理解,下面结合具体的应用场景,对本申请提供的生成模型训练方法的流程进行介绍。
首先,本申请提供的生成模型训练方法中,可以同步训练第二扩散模型,也可以在训练生成模型之前进行预训练得到第二扩散模型,下面从不同的训练方式的维度对本申请提供的生成模型训练方法进行介绍。
一、同步训练第二扩散模型
参阅图5,本申请提供的另一种生成模型训练方法的流程示意图。
首先,初始化模型参数,包括:生成模型Gθ,扩散模型
随后进行迭代步骤,直至模型收敛。其中,在每次迭代过程中,可以更新扩散模型以及基于更新后的扩散模型以及更新生成模型Gθ。判断模型收敛的方式可以包括多种,如迭代次数达到预设次数、迭代时长达到预设时长、扩散模型输出的差值在预设范围内、损失函数的变化值不大于预设变化值等,具体可以根据实际应用场景选择。
随后对收敛的生成模型Gθ进行部署,如将生成模型Gθ部署于云平台,通过客户端的形式为用户提供服务器,或者将生成模型Gθ部署于用户的本地服务器或者终端,用户可以在本地进行数据转换。
结合图6,对训练过程进行详细介绍。
首先,给定初始化数据。
生成模型Gθ可以采用多种网络结构,通常可以根据实际所需执行的任务来选择,如可以构建符合计算设备硬件负载范围的网络,也可以选择常用的网络架构,如U-Net、CNN或者RNN等网络,具体可以根据实际应用场景确定。
扩散模型以及可以使用相同的扩散过程,如表示为p(xt|x0),扩散过程可以理解为对样本进行加噪的过程,如表示为d Xt=f(x,t)dt+g(t)dWt
为真实样本得分函数,模型的参数可以表示为φ,包括权重参数或者网络参数等。
为生成样本得分函数,模型的参数可以表示为ψ,包括权重参数或者网络参数等。
例如,可以采集图像数据集合,选取StyleGAN 3网络架构作为生成模型的架构,扩散模型的扩散过程可以选择高斯扩散、方差保持(variance preserving,VP)或者方差爆炸(variance exploding,VE)等扩散方式,扩散步长可以选择T=1000步,真实数据的得分函数和生成数据的得分函数可以选为U-Net架构的网络,并初始化各个模型的参数。
随后进行训练。在训练过程中,首先需获取扩散模型以及所需的输入样本。的输入样本为真实数据,如图像、文本或者语音等。的输入样本为生成模型Gθ输出的生成样本,Gθ可以接收噪声样本,将噪声样本作为生成模型Gθ的输入,输出多个生成样本。
训练流程可以包括更新扩散模型以及基于更新后的扩散模型以及更新生成模型Gθ,更新扩散模型的步骤可以同步执行,下面分别进行介绍。
1、更新扩散模型
其中,可以通过扩散模型对同一输入的样本进行多次扩散,并基于每次扩散得到的扩散数据更新扩散模型例如,以输入至扩散模型的样本为图像为例,扩散过程可以参阅图7,每次扩散可以对上一次扩散得到的图像进行加噪,直到达到扩散次数或者得到噪声数据。
可以随机取一个扩散时刻t~Unif[0,T],取真实样本x0~pd,经过概率扩散过程得到扩散后的数据xt~p(xt|x0)。
随后通过扩散后的数据计算损失函数,损失函数具体可以采用最小化损失函数、误差平方均方、交叉熵、对数、指数等损失函数。
示例性地,以最小化损失函数为例,如表示为:
其中,表示对当前次扩散增加的噪声求梯度。
在计算得到L(φ)后,即可使用该损失值进行反向更新,即更新参数φ,得到更新后的
2、更新扩散模型
扩散模型的更新过程与扩散模型的更新过程类似,区别于在于输入样本不同。
随机取一个扩散时刻t~Unif[0,T],取生成样本x0=Gθ(z),z~pprior(z),经过概率扩散过程得到扩散后的数据xt~p(xt|x0)。通过最小化损失函数计算损失值,如表示为:
在计算得到L(ψ)后,即可使用该损失值进行反向更新,即更新参数ψ,得到更新后的
需要说明的是,本申请对扩散模型以及的更新顺序不做限定,可以先更新也可以先更新还可以同时更新以及具体可以根据实际应用场景调整,本申请对此不做限定。
3、更新生成模型Gθ
在更新扩散模型以及后,即可固定当前次迭代中扩散模型以及的参数,即固定φ和ψ后。获取当前训练批次的生成数据经概率扩散后得到扩散后的生成数据xt~p(xt|x0)。再次将真实样本作为更新的扩散模型的输入,输出将生成样本再次作为更新后的的输入,输出
随后计算损失值,如表示为:
在计算得到L(θ)后,即可使用该损失值对生成模型Gθ进行反向更新,即更新参数θ,得到更新后的Gθ
本申请实施方式中,设置了多个扩散模型,分别使用真实样本和生成样本对扩散模型进行训练,通过扩散过程中增加噪声,来拉进真实样本和生成样本之间的距离,以使使用生成样本进行训练的扩散模型的输出尽可能与使用真实样本训练的扩散模型的输出相匹配。通过反向更新生成模型,使生成模型输出的生成样本对应的得分函数的输出值与真实样本对应的得分函数的输出值更接近。相对于采用对抗训练的方式进行训练,本申请通过匹配得分函数的输出值的方式来更新生成模型,优化过程更稳定,可以实现高效的训练。且针对各种生成场景都可以应用本申请提供的方法进行训练,可以实现生成多样性,泛化能力强。并且,通过本申请提供的方式进行训练,可以将通过真实样本训练得到的第二扩散模型的输出作为指导,使第一扩散模型更高效地学习到得分函数的参数,无需更复杂的扩散模型结构,无需占用更多的存储空间。
二、预训练第二扩散模型
参阅图8,本申请提供的另一种生成模型训练方法的流程示意图。
其中,预训练第二扩散模型的流程与前述图5中的流程的区别在于,在本申请实施方式中,可以无需同步训练第二扩散模型,可以直接从经过预训练的第二扩散模型中提取对应尺度的得分函数的值,作为指导来训练生成模型。
下面对具体的训练流程进行介绍,参阅图9。
首先,给定初始化数据。
生成模型Gθ可以采用多种网络结构,通常可以根据实际所需执行的任务来选择,如可以构建符合计算设备硬件负载范围的网络,也可以选择常用的网络架构,如U-Net、CNN或者RNN等网络,具体可以根据实际应用场景确定。
扩散模型以及可以使用相同的扩散过程,扩散模型为预训练模型,无需进行更新。扩散模型可以表示如表示为p(xt|x0),扩散过程可以理解为对样本进行加噪的过程,如表示为d xt=f(x,t)dt+g(t)dwt
为真实样本得分函数,模型的参数可以表示为φ,包括权重参数或者网络参数等,在本实施例中无需更新
为生成样本得分函数,模型的参数可以表示为ψ,包括权重参数或者网络参数等。
例如,可以采集图像数据集合,选取StyleGAN 3网络架构作为生成模型的架构,扩散模型的扩散过程可以选择高斯扩散,扩散步长可以选择T=1000步,真实数据的得分函数和生成数据的得分函数可以包括U-Net架构的网络,并初始化生成模型和的参数。
随后进行训练。在训练过程中,首先需获取扩散模型所需的输入样本。的输入样本为生成模型Gθ输出的生成样本,可以随机生成样本或者接收噪声样本,将噪声样本作为生成模型Gθ的输入,输出多个生成样本。
训练流程可以包括更新扩散模型基于预训练的的输出以及更新后的扩散模型的输出更新生成模型Gθ,更新扩散模型的步骤可以同步执行,下面分别进行介绍。
1、更新扩散模型
随机取一个扩散时刻t~Unif[0,T],取生成样本x0=Gθ(z),z~pprior(z),经过概率扩散过程得到扩散后的数据xt~p(xt|x0)。通过最小化损失函数计算损失值,如表示为:
在计算得到L(ψ)后,即可使用该损失值进行反向更新,即更新参数ψ,得到更新后的
2、更新生成模型Gθ
在更新扩散模型后,即可固定当前次迭代中扩散模型以及的参数,即固定φ和ψ后。获取当前训练批次的生成数据x0=Gθ(z),z~pprior(z),经概率扩散后得到扩散后的生成数据xt~p(xt|x0)。再次将真实样本作为更新的扩散模型的输入,输出将生成样本再次作为更新后的的输入,输出
随后计算损失值,如表示为:
在计算得到L(θ)后,即可使用该损失值对生成模型Gθ进行反向更新,即更新参数θ,得到更新后的Gθ
例如,以DELLE-2模型为例,其中图像生成器的部分运用了扩散生成模型,而通常DELLE-2模型的生成过程需要花费较长时长。因此可以对训练好的DELLE-2模型的生成器模块进行知识蒸馏,蒸馏目标可以是隐式生成网络StyleGAN-XL,可以在保持生成效果的前提下,大大提高生成速度。
本申请实施方式中,可以使用预训练的扩散模型,来对隐式生成模型进行指导,相当于使用预训练的扩散模型作为老师模型,隐式生成模型作为学生模型,进行知识蒸馏,可以减少生成模型的训练开支。且在训练生成模型的过程中,可以训练较少的扩散模型,可以提高生成模型的训练效率。
可以理解为,本申请提供了一种全新的基于概率扩散过程的非对抗隐式生成模型训练方式,通过在隐式生成模型训练中引入概率扩散过程,在多个扩散尺度上匹配真实数据与生成数据分布的之间的得分函数,实现隐式生成模型的高效非对抗训练。
前述对本申请提供的生成模型训练方法进行了介绍,训练后的生成模型可以部署于云端、本地服务器或者本地终端中,下面结合具体的应用场景,对本申请提供的数据转换方法以及前述生成模型训练方法所实现的效果进行详细介绍。
参阅图10,本申请提供的一种数据转换方法的流程示意图,如下所述。
1001、接收用户的输入数据。
其中,生成模型可以部署于云端设备或者本地设备中,当部署于云端设备时,用户可以在本地客户端进行输入操作,如输入需转换为图像的文本,并由客户端发送给云端,云端即可接收用户的输入数据。当部署于本地计算设备时,用户可以通过输入设备向本地计算设备输入数据。
1002、将输入数据作为生成模型的输入,得到输出结果并反馈给用户。
在接收得到输入数据后,即可将输入数据作为生成模型的输入,并将输出结果反馈给用户。该生成模型可以用于进行数据转换,可以通过前述图4只图9对应的步骤进行训练得到,具体参阅前述介绍,此处不再赘述。
该生成模型可以包含特征提取模块,可以执行的任务可以包括将输入的文本转换为图像、将输入的语音转换为图像、对输入的图像进行数据补全、将输入的文本转换为语音或者转换输入的图像的分辨率中的一种或者多种。
例如,若生成模型用于将输入的文本转换为图像,噪声集合中的数据包括多帧文本数据,则可以通过生成模型将文本转换为表征向量,并从表征向量中提取特征,根据该特征输出对应的图像。如若输入的文本包括“动物、猫”,则可以将文本转换为神经网络可处理的数据类型,即转换为嵌入表征(embedding),并从该嵌入表征中提取特征,并生成包括了猫的图像。
又例如,若生成模型用于将输入的图像进行补全,噪声集合中的数据可以包括多帧图像。可以将噪声集合中的图像作为生成模型的输入,通过该生成模型从图像中提取特征,并根据该特征推理需补全的像素点的像素值,得到补全后的图像。
又例如,若生成模型用于将输入的语音转换为图像,噪声集合中可以包括多帧语音数据。将语音数据作为生成模型的输入,通过生成模型识别语音数据的语义特征,并根据提取到的特征进行处理,生成对应的图像。
在生成模型基于输入数据生成输出结果之后,可以将输出结果反馈给用户。
例如,若生成模型部署于云端时,在得到输出结果后,可以向用户的客户端发送该输出结果,可以在用户的客户端中展示该输出结果。
又例如,若生成模型部署于本地计算设备中,在得到输出结果后,即可在本地计算设备中设置的显示设备或者连接的显示设备展示输出结果。
因此,本申请实施方式中,可以采用高效训练得到的生成模型来实现数据转换,可以实现更优的生成效果。隐式生成模型具有轻量化的特性,因此,部署生成模型无需占用更多的存储资源,可以适用于多种硬件设备,泛化能力强。
下面进一步地结合具体的实施场景对训练得到的生成模型的输出效果进行介绍。
将本申请训练得到的模型与常用的生成模型进行对比,常用生成模型以GAN和WGAN为例,拟合效果参阅表1,生成效果可以如图11所示,生成模型的损失函数的趋势可以如图12所示。
表1
相比于已有的扩散概率模型,在生成质量/似然估计效果可比前提下,相比对抗训练的隐式生成模型,拟合效果与稳定性均有提升;相比于扩散模型,生成速度有上百倍提升。
此外,以一些具体的基于文本生成图像的生成过程为例。
例如,如图13所示,用户可以在客户端进行输入,输入文本可以包括“古道西风瘦马缥缈悠扬3D绘画古代”,可以通过生成模型输出多帧对应的输出图像,生成模型可以提取输入文本所包括的特征,如可以包括实体“马”以及“古道”等,以及需转换的数据类型“绘画”,且清晰度高,用户体验友好。
例如,如图14所示,用户可以在客户端进行输入,输入文本可以包括“摩托车夕阳西下中国风画”,可以通过生成模型输出多帧对应的输出图像。如可以通过生成模型输出多帧对应的输出图像,生成模型可以提取输入文本所包括的特征,如可以包括实体“摩托车”以及“夕阳”等,需转换的数据类型“画”以及图像的风格“中古风”,可以对多种特征进行组合,从而生成多帧图像。
例如,如图15所示,用户可以在客户端进行输入,输入文本可以包括“未来城市科幻插画”,可以通过生成模型输出多帧对应的输出图像。如可以通过生成模型输出多帧对应的输出图像,生成模型可以提取输入文本所包括的特征,如可以包括实体“城市”,需转换的数据类型“插画”以及图像的风格“科幻”,可以对多种特征进行组合,从而生成多帧图像。
例如,如图16所示,用户可以在客户端进行输入,输入文本可以包括“金字塔梵高风格”,可以通过生成模型输出多帧对应的输出图像。如可以通过生成模型输出多帧对应的输出图像,生成模型可以提取输入文本所包括的特征,如可以包括实体“金字塔”或者“梵高”,需转换的数据类型可以默认为图像以及图像的风格“梵高风格”,可以对多种特征进行组合,从而生成多帧图像。
例如,如图17所示,用户可以在客户端进行输入,输入文本可以包括“一杯咖啡吸收宇宙能量3D绘画”,可以通过生成模型输出多帧对应的输出图像。如可以通过生成模型输出多帧对应的输出图像,生成模型可以提取输入文本所包括的特征,如可以包括实体“咖啡”、“宇宙”或者“能量”等,需转换的数据类型可以默认为图像以及图像的风格“3D绘画”,可以对多种特征进行组合等,从而生成多帧图像。
前述对本申请提供的方法的流程进行了介绍,下面结合前述的方法步骤,对执行该方法步骤的装置进行介绍。
参阅图18,本申请提供的一种生成模型训练装置的结构示意图,包括:
生成模块1801,用于将噪声集合中的数据作为生成模型的输入,输出至少一个生成样本,生成模型用于对输入的数据进行数据转换,噪声集合中包括多帧噪声数据;
第一扩散模块1802,用于将至少一个生成样本作为第一扩散模型的输入,输出至少一个第一扩散得分值,第一扩散模型用于分别对每个生成样本进行至少一次扩散并对扩散后的数据进行评分;
训练模块1803,用于根据至少一个第一扩散得分值以及第二扩散模型输出的至少一个第二扩散得分值对生成模型进行更新,得到更新后的生成模型,第二扩散模型为使用真实样本集合训练得到,真实样本集合中的每个样本包括对应的标签,第二扩散模型用于对输入的数据进行至少一次扩散并对扩散后的数据进行评分,第一扩散模型和第二扩散模型的参数不相同,更新后的生成模型用于从用户在计算设备中输入的数据中提取特征并根据提取到的为特征生成对应的数据。
在一种可能的实施方式中,训练模块1803,具体用于:
通过至少一个第一扩散得分值对第一扩散模型进行更新,得到更新后的第一扩散模型;将至少一个生成样本作为更新后的第一扩散模型的输入,输出至少一个第三扩散得分值,至少一个第二扩散得分值和至少一个第三扩散得分值一一对应;获取第二扩散模型输出的至少一个第二扩散得分值根据至少一个第三扩散得分值中每个第三扩散得分值和对应的第二扩散得分值之间的损失值,更新生成模型,得到更新后的生成模型。
在一种可能的实施方式中,装置还包括:第二扩散模块1804,用于将真实样本集合中的样本作为第二扩散模型的输入,输出至少一个第四扩散得分值;
训练模块1803,还用于根据至少一个第四扩散得分值更新第二扩散模型,得到更新后的第二扩散模型;将真实样本集合中的样本作为更新后的第二扩散模型的输入,输出至少一个第二扩散得分值。
在一种可能的实施方式中,第二扩散模型为真实样本集合预训练后的模型,训练模块1803,还用于从第二扩散模型中至少一个第二扩散得分值。
在一种可能的实施方式中,第一扩散模型用于:按照预设步长对第一生成样本进行加噪,得到至少一个第一噪声样本;将至少一个第一噪声样本作为第一得分函数的输入,输出至少一个第一扩散得分值。
在一种可能的实施方式中,当真实样本集合中的样本作为第二扩散模型的输入时,第二扩散模型用于:按照预设步长对真实样本集合中样本进行加噪,得到至少一个第二噪声样本;将至少一个第二噪声样本作为第二得分函数的输入,得到至少一个第二扩散得分值。
在一种可能的实施方式中,生成模型用于执行以下一种或者多种任务:将输入的文本转换为图像、将输入的语音转换为图像、对输入的图像进行数据补全、将输入的文本转换为语音或者转换输入的图像的分辨率。
参阅图19,本申请提供的一种数据转换装置的结构示意图,包括:
收发模块1901,用于接收输入数据,输入数据包括用户输入的数据;
生成模块1902,用于将输入数据作为生成模型的输入,得到输出结果,生成模型用于从输入数据提取特征,并使用提取到的特征进行建模得到输出结果;
其中,生成模型用于从输入数据中提取特征,根据提取到的特征生成预设类型的数据,生成模型为根据第一扩散模型和第二扩散模型的输出结果进行训练得到,第一扩散模型为使用训练完成前的生成模型的输出样本进行训练得到,第二扩散模型为使用真实样本集合训练得到,真实样本集合中的每个样本包括对应的标签,第二扩散模型用于对输入的数据进行至少一次扩散并对扩散后的数据进行评分,第一扩散模型和第二扩散模型的参数不相同。
具体地,该生成模型可以通过前述图4-17对应的生成模型训练方法的流程训练得到。
在一种可能的实施方式中,生成模型用于执行以下一种或者多种任务:将输入的文本转换为图像、将输入的语音转换为图像、对输入的图像进行数据补全、将输入的文本转换为语音或者转换输入的图像的分辨率。
请参阅图20,本申请提供的另一种生成模型训练装置的结构示意图,如下所述。
该生成模型训练装置可以包括处理器2001和存储器2002。该处理器2001和存储器2002通过线路互联。其中,存储器2002中存储有程序指令和数据。
存储器2002中存储了前述图4-图17中的步骤对应的程序指令以及数据。
处理器2001用于执行前述图4-图17中任一实施例所示的生成模型训练装置执行的方法步骤。
可选地,该生成模型训练装置还可以包括收发器2003,用于接收或者发送数据。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于生成车辆行驶速度的程序,当其在计算机上行驶时,使得计算机执行如前述图4-图17所示实施例描述的方法中的步骤。
可选地,前述的图20中所示的生成模型训练装置为芯片。
请参阅图21,本申请提供的另一种数据转换装置的结构示意图,如下所述。
该数据转换装置可以包括处理器2101和存储器2102。该处理器2101和存储器2102通过线路互联。其中,存储器2102中存储有程序指令和数据。
存储器2102中存储了前述图4-图17中的步骤对应的程序指令以及数据。
处理器2101用于执行前述图4-图17中任一实施例所示的数据转换装置执行的方法步骤。
可选地,该数据转换装置还可以包括收发器2103,用于接收或者发送数据。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于生成车辆行驶速度的程序,当其在计算机上行驶时,使得计算机执行如前述图4-图17所示实施例描述的方法中的步骤。
可选地,前述的图21中所示的数据转换装置为芯片。
本申请实施例还提供了一种生成模型训练装置,该生成模型训练装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行前述图4-图17中任一实施例所示的生成模型训练装置执行的方法步骤。
本申请实施例还提供了一种数据转换装置,该数据转换装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行前述图4-图17中任一实施例所示的数据转换装置执行的方法步骤。
本申请实施例还提供一种数字处理芯片。该数字处理芯片中集成了用于实现上述处理器2001,或者处理器2001的功能的电路和一个或者多个接口。当该数字处理芯片中集成了存储器时,该数字处理芯片可以完成前述实施例中的任一个或多个实施例的方法步骤。当该数字处理芯片中未集成存储器时,可以通过通信接口与外置的存储器连接。该数字处理芯片根据外置的存储器中存储的程序代码来实现上述实施例中生成模型训练装置执行的动作。
本申请实施例提供的生成模型训练装置可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使服务器内的芯片执行上述图4-图17所示实施例描述的生成模型训练方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
本申请实施例还提供一种数字处理芯片。该数字处理芯片中集成了用于实现上述处理器2101,或者处理器2101的功能的电路和一个或者多个接口。当该数字处理芯片中集成了存储器时,该数字处理芯片可以完成前述实施例中的任一个或多个实施例的方法步骤。当该数字处理芯片中未集成存储器时,可以通过通信接口与外置的存储器连接。该数字处理芯片根据外置的存储器中存储的程序代码来实现上述实施例中数据转换装置执行的动作。
本申请实施例提供的数据转换装置可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使服务器内的芯片执行上述图4-图17所示实施例描述的数据转换方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图4-图17所示实施例描述的方法中图像解压装置或者图像解压装置所执行的步骤。
具体地,前述的处理单元或者处理器可以是中央处理器(central processing unit,CPU)、网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者也可以是任何常规的处理器等。
示例性地,请参阅图22,图22为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 220,NPU 220作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路2203,通过控制器2204控制运算电路2203提取存储器中的矩阵数据并进行乘法运算。
在一些实现中,运算电路2203内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路2203是二维脉动阵列。运算电路2203还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路2203是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器2202中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器2201中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)2208中。
统一存储器2206用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)2205,DMAC被搬运到权重存储器2202中。输入数据也通过DMAC被搬运到统一存储器2206中。
总线接口单元(bus interface unit,BIU)2210,用于AXI总线与DMAC和取指存储器(instruction fetch buffer,IFB)2209的交互。
总线接口单元2210(bus interface unit,BIU),用于取指存储器2209从外部存储器获取指令,还用于存储单元访问控制器2205从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器2206或将权重数据搬运到权重存储器2202中或将输入数据数据搬运到输入存储器2201中。
向量计算单元2207包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如批归一化(batch normalization),像素级求和,对特征平面进行上采样等。
在一些实现中,向量计算单元2207能将经处理的输出的向量存储到统一存储器2206。例如,向量计算单元2207可以将线性函数和/或非线性函数应用到运算电路2203的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元2207生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路2203的激活输入,例如用于在神经网络中的后续层中的使用。
控制器2204连接的取指存储器(instruction fetch buffer)2209,用于存储控制器2204使用的指令;
统一存储器2206,输入存储器2201,权重存储器2202以及取指存储器2209均为On-Chip存储器。外部存储器私有于该NPU硬件架构。
其中,循环神经网络中各层的运算可以由运算电路2203或向量计算单元2207执行。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述图3-图5的方法的程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
最后应说明的是:以上,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。

Claims (22)

  1. 一种生成模型训练方法,其特征在于,包括:
    将噪声集合中的数据作为生成模型的输入,输出至少一个生成样本,所述生成模型用于对输入的数据进行数据转换,所述噪声集合中包括多帧噪声数据;
    将所述至少一个生成样本作为第一扩散模型的输入,输出至少一个第一扩散得分值,所述第一扩散模型用于分别对每个生成样本进行至少一次扩散并对扩散后的数据进行评分;
    根据所述至少一个第一扩散得分值以及第二扩散模型输出的至少一个第二扩散得分值对所述生成模型进行更新,得到更新后的生成模型,所述第二扩散模型为使用真实样本集合训练得到,所述真实样本集合中的每个样本包括对应的标签,所述第二扩散模型用于对输入的数据进行至少一次扩散并对扩散后的数据进行评分,所述第一扩散模型和所述第二扩散模型的参数不相同,所述更新后的生成模型用于从用户在计算设备中输入的数据中提取特征并根据提取到的为特征生成对应的数据。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述至少一个第一扩散得分值以及第二扩散模型输出的至少一个第二扩散得分值对所述生成模型进行更新,得到更新后的生成模型,包括:
    通过所述至少一个第一扩散得分值对所述第一扩散模型进行更新,得到更新后的第一扩散模型;
    将所述至少一个生成样本作为所述更新后的第一扩散模型的输入,输出至少一个第三扩散得分值,所述至少一个第二扩散得分值和所述至少一个第三扩散得分值一一对应;
    所述获取所述第二扩散模型输出的至少一个第二扩散得分值;
    根据所述至少一个第三扩散得分值中每个第三扩散得分值和对应的第二扩散得分值之间的损失值,更新所述生成模型,得到更新后的生成模型。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    将所述真实样本集合中的样本作为所述第二扩散模型的输入,输出至少一个第四扩散得分值;
    根据所述至少一个第四扩散得分值更新所述第二扩散模型,得到更新后的第二扩散模型;
    所述获取所述第二扩散模型输出的至少一个第二扩散得分值,包括:
    将所述真实样本集合中的样本作为所述更新后的第二扩散模型的输入,输出所述至少一个第二扩散得分值。
  4. 根据权利要求2所述的方法,其特征在于,所述第二扩散模型为所述真实样本集合预训练后的模型,所述获取所述第二扩散模型输出的至少一个第二扩散得分值,包括:
    从所述第二扩散模型中所述至少一个第二扩散得分值。
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述第一扩散模型用于:
    按照预设步长对第一生成样本进行加噪,得到所述至少一个第一噪声样本;
    将所述至少一个第一噪声样本作为第一得分函数的输入,输出所述至少一个第一扩散得分值。
  6. 根据权利要求5所述的方法,其特征在于,当所述真实样本集合中的样本作为所述第二扩散模型的输入时,所述第二扩散模型用于:
    按照预设步长对所述真实样本集合中样本进行加噪,得到至少一个第二噪声样本;
    将所述至少一个第二噪声样本作为第二得分函数的输入,得到所述至少一个第二扩散得分值。
  7. 根据权利要求1-6中任一项所述的方法,其特征在于,
    所述生成模型用于执行以下一种或者多种任务:将输入的文本转换为图像、将输入的语音转换为图像、对输入的图像进行数据补全、将输入的文本转换为语音或者转换输入的图像的分辨率。
  8. 一种数据转换方法,其特征在于,包括:
    接收输入数据,所述输入数据包括用户输入的数据;
    将所述输入数据作为生成模型的输入,得到输出结果,所述生成模型用于从所述输入数据提取特征,并使用提取到的特征进行建模得到所述输出结果;
    其中,所述生成模型用于从所述输入数据中提取特征,根据提取到的特征生成预设类型的数据,所述生成模型为根据第一扩散模型和第二扩散模型的输出结果进行训练得到,所述第一扩散模型为使用训练完成前的生成模型的输出样本进行训练得到,所述第二扩散模型为使用真实样本集合训练得到,所述真实样本集合中的每个样本包括对应的标签,所述第二扩散模型用于对输入的数据进行至少一次扩散并对扩散后的数据进行评分,所述第一扩散模型和所述第二扩散模型的参数不相同。
  9. 根据权利要求8所述的方法,其特征在于,所述生成模型用于执行以下一种或者多种任务:将输入的文本转换为图像、将输入的语音转换为图像、对输入的图像进行数据补全、将输入的文本转换为语音或者转换输入的图像的分辨率。
  10. 一种生成模型训练装置,其特征在于,包括:
    生成模块,用于将噪声集合中的数据作为生成模型的输入,输出至少一个生成样本,所述生成模型用于对输入的数据进行数据转换,所述噪声集合中包括多帧噪声数据;
    第一扩散模块,用于将所述至少一个生成样本作为第一扩散模型的输入,输出至少一个第一扩散得分值,所述第一扩散模型用于分别对每个生成样本进行至少一次扩散并对扩散后的数据进行评分;
    训练模块,用于根据所述至少一个第一扩散得分值以及第二扩散模型输出的至少一个第二扩散得分值对所述生成模型进行更新,得到更新后的生成模型,所述第二扩散模型为使用真实样本集合训练得到,所述真实样本集合中的每个样本包括对应的标签,所述第二扩散模型用于对输入的数据进行至少一次扩散并对扩散后的数据进行评分,所述第一扩散模型和所述第二扩散模型的参数不相同,所述更新后的生成模型用于从用户在计算设备中输入的数据中提取特征并根据提取到的为特征生成对应的数据。
  11. 根据权利要求10所述的装置,其特征在于,所述训练模块,具体用于:
    通过所述至少一个第一扩散得分值对所述第一扩散模型进行更新,得到更新后的第一扩散模型;
    将所述至少一个生成样本作为所述更新后的第一扩散模型的输入,输出至少一个第三扩散得分值,所述至少一个第二扩散得分值和所述至少一个第三扩散得分值一一对应;
    所述获取所述第二扩散模型输出的至少一个第二扩散得分值;
    根据所述至少一个第三扩散得分值中每个第三扩散得分值和对应的第二扩散得分值之间的损失值,更新所述生成模型,得到更新后的生成模型。
  12. 根据权利要求11所述的装置,其特征在于,所述装置还包括:
    第二扩散模块,用于将所述真实样本集合中的样本作为所述第二扩散模型的输入,输出至少一个第四扩散得分值;
    所述训练模块,还用于:
    根据所述至少一个第四扩散得分值更新所述第二扩散模型,得到更新后的第二扩散模型;
    将所述真实样本集合中的样本作为所述更新后的第二扩散模型的输入,输出所述至少一个第二扩散得分值。
  13. 根据权利要求11所述的装置,其特征在于,所述第二扩散模型为所述真实样本集合预训练后的模型,
    所述训练模块,还用于从所述第二扩散模型中所述至少一个第二扩散得分值。
  14. 根据权利要求10-13中任一项所述的装置,其特征在于,所述第一扩散模型用于:
    按照预设步长对第一生成样本进行加噪,得到所述至少一个第一噪声样本;
    将所述至少一个第一噪声样本作为第一得分函数的输入,输出所述至少一个第一扩散得分值。
  15. 根据权利要求14所述的装置,其特征在于,当所述真实样本集合中的样本作为所述第二扩散模型的输入时,所述第二扩散模型用于:
    按照预设步长对所述真实样本集合中样本进行加噪,得到至少一个第二噪声样本;
    将所述至少一个第二噪声样本作为第二得分函数的输入,得到所述至少一个第二扩散得分值。
  16. 根据权利要求10-15中任一项所述的装置,其特征在于,
    所述生成模型用于执行以下一种或者多种任务:将输入的文本转换为图像、将输入的语音转换为图像、对输入的图像进行数据补全、将输入的文本转换为语音或者转换输入的图像的分辨率。
  17. 一种数据转换装置,其特征在于,包括:
    收发模块,用于接收输入数据,所述输入数据包括用户输入的数据;
    生成模块,用于将所述输入数据作为生成模型的输入,得到输出结果,所述生成模型用于从所述输入数据提取特征,并使用提取到的特征进行建模得到所述输出结果;
    其中,所述生成模型用于从所述输入数据中提取特征,根据提取到的特征生成预设类型的数据,所述生成模型为根据第一扩散模型和第二扩散模型的输出结果进行训练得到,所述第一扩散模型为使用训练完成前的生成模型的输出样本进行训练得到,所述第二扩散模型为使用真实样本集合训练得到,所述真实样本集合中的每个样本包括对应的标签,所述第二扩散模型用于对输入的数据进行至少一次扩散并对扩散后的数据进行评分,所述第一扩散模型和所述第二扩散模型的参数不相同。
  18. 根据权利要求17所述的装置,其特征在于,所述生成模型用于执行以下一种或者多种任务:将输入的文本转换为图像、将输入的语音转换为图像、对输入的图像进行数据补全、将输入的文本转换为语音或者转换输入的图像的分辨率。
  19. 一种生成模型训练装置,其特征在于,包括处理器,所述处理器和存储器耦合,所述存储器存储有程序,当所述存储器存储的程序指令被所述处理器执行时实现权利要求1-7中任一项所述的方法的步骤。
  20. 一种数据转换装置,其特征在于,包括处理器,所述处理器和存储器耦合,所述存储器存储有程序,当所述存储器存储的程序指令被所述处理器执行时实现权利要求8-9中任一项所述的方法的步骤。
  21. 一种计算机可读存储介质,其特征在于,包括计算机程序指令,当所述计算机程序指令由处理器执行时,所述处理器执行如权利要求1-9中任一项所述的方法。
  22. 一种计算机程序产品,其特征在于,所述计算机程序产品包括软件代码,所述软件代码用于执行如权利要求1至9中任一项所述的方法的步骤。
PCT/CN2023/133865 2022-11-26 2023-11-24 一种生成模型训练方法、数据转换方法以及装置 WO2024109910A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211497412.1 2022-11-26
CN202211497412.1A CN118095368A (zh) 2022-11-26 2022-11-26 一种生成模型训练方法、数据转换方法以及装置

Publications (1)

Publication Number Publication Date
WO2024109910A1 true WO2024109910A1 (zh) 2024-05-30

Family

ID=91144652

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/133865 WO2024109910A1 (zh) 2022-11-26 2023-11-24 一种生成模型训练方法、数据转换方法以及装置

Country Status (2)

Country Link
CN (1) CN118095368A (zh)
WO (1) WO2024109910A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118445626A (zh) * 2024-07-08 2024-08-06 深圳新视智科技术有限公司 基于扩散模型的训练样本生成方法、装置及计算机设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200357096A1 (en) * 2018-01-25 2020-11-12 King Abdullah University Of Science And Technology Deep-learning based structure reconstruction method and apparatus
CN111951805A (zh) * 2020-07-10 2020-11-17 华为技术有限公司 一种文本数据处理方法及装置
WO2021042774A1 (zh) * 2019-09-04 2021-03-11 华为技术有限公司 图像恢复方法、图像恢复网络训练方法、装置和存储介质
CN113807399A (zh) * 2021-08-16 2021-12-17 华为技术有限公司 一种神经网络训练方法、检测方法以及装置
WO2022042713A1 (zh) * 2020-08-31 2022-03-03 华为技术有限公司 一种用于计算设备的深度学习训练方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200357096A1 (en) * 2018-01-25 2020-11-12 King Abdullah University Of Science And Technology Deep-learning based structure reconstruction method and apparatus
WO2021042774A1 (zh) * 2019-09-04 2021-03-11 华为技术有限公司 图像恢复方法、图像恢复网络训练方法、装置和存储介质
CN111951805A (zh) * 2020-07-10 2020-11-17 华为技术有限公司 一种文本数据处理方法及装置
WO2022042713A1 (zh) * 2020-08-31 2022-03-03 华为技术有限公司 一种用于计算设备的深度学习训练方法和装置
CN113807399A (zh) * 2021-08-16 2021-12-17 华为技术有限公司 一种神经网络训练方法、检测方法以及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118445626A (zh) * 2024-07-08 2024-08-06 深圳新视智科技术有限公司 基于扩散模型的训练样本生成方法、装置及计算机设备

Also Published As

Publication number Publication date
CN118095368A (zh) 2024-05-28

Similar Documents

Publication Publication Date Title
CN112183718B (zh) 一种用于计算设备的深度学习训练方法和装置
WO2022083536A1 (zh) 一种神经网络构建方法以及装置
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
WO2023221928A1 (zh) 一种推荐方法、训练方法以及装置
WO2022179492A1 (zh) 一种卷积神经网络的剪枝处理方法、数据处理方法及设备
WO2021233342A1 (zh) 一种神经网络构建方法以及系统
WO2021008206A1 (zh) 神经网络结构的搜索方法、图像处理方法和装置
WO2021129668A1 (zh) 训练神经网络的方法和装置
WO2022179586A1 (zh) 一种模型训练方法及其相关联设备
WO2024041479A1 (zh) 一种数据处理方法及其装置
WO2024109910A1 (zh) 一种生成模型训练方法、数据转换方法以及装置
WO2023231794A1 (zh) 一种神经网络参数量化方法和装置
WO2024160186A1 (zh) 一种模型训练方法及其相关设备
CN113505883A (zh) 一种神经网络训练方法以及装置
CN111797992A (zh) 一种机器学习优化方法以及装置
WO2024001806A1 (zh) 一种基于联邦学习的数据价值评估方法及其相关设备
WO2024198711A1 (zh) 一种联邦学习方法及相关装置
WO2022111387A1 (zh) 一种数据处理方法及相关装置
WO2024213099A1 (zh) 一种数据处理方法及其装置
CN111931901A (zh) 一种神经网络构建方法以及装置
WO2024160216A1 (zh) 一种联邦学习方法及相关装置
WO2024212648A1 (zh) 一种分类模型的训练方法及相关装置
WO2024199404A1 (zh) 一种消费预测方法及其相关设备
WO2024179485A1 (zh) 一种图像处理方法及其相关设备
WO2024199409A1 (zh) 一种数据处理方法及其装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23893999

Country of ref document: EP

Kind code of ref document: A1