CN116721186A

CN116721186A - Drawing image generation method and device, electronic equipment and storage medium

Info

Publication number: CN116721186A
Application number: CN202311003292.XA
Authority: CN
Inventors: 陈煌榕; 门征; 徐元春
Original assignee: Beijing Hongmian Xiaoice Technology Co Ltd
Current assignee: Beijing Xiaobing Yuedong Technology Co ltd
Priority date: 2023-08-10
Filing date: 2023-08-10
Publication date: 2023-09-08
Anticipated expiration: 2043-08-10
Also published as: CN116721186B

Abstract

The invention provides a drawing image generation method, a device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence, wherein the method comprises the following steps: acquiring text information and a feature control image, wherein the text information is a text describing an image picture of a drawing image to be generated, and the drawing image to be generated has image features of the feature control image; based on the text information, obtaining a text code corresponding to the text information; based on text codes, intervening the image codes obtained by each round of reasoning in the diffusion model in the process of multiple rounds of iterative reasoning to obtain basic image codes of each round of reasoning; obtaining a feature control image code corresponding to the feature control image based on the feature control image; and based on the characteristic control image coding, intervening the basic image coding of the target round reasoning until the reasoning process is finished, so as to obtain the drawing image to be generated. The method reduces the running load of the video memory and the graphic processor and reduces the generation cost of the drawing image.

Description

Painting image generation method, device, electronic equipment and storage medium

技术领域Technical field

本发明涉及人工智能技术领域，尤其涉及一种绘画图像生成方法、装置、电子设备及存储介质。The present invention relates to the field of artificial intelligence technology, and in particular to a painting image generation method, device, electronic equipment and storage medium.

背景技术Background technique

相关技术可知，业界利用扩散模型（又称Diffusion模型）作为绘画图像生成的主要工具。相比于其他生成模型，Diffusion模型拥有多轮迭代，逐步去噪的特点，因此生成的绘画图像具有更高的水准。It can be seen from related technologies that the industry uses the diffusion model (also known as the Diffusion model) as the main tool for painting image generation. Compared with other generative models, the Diffusion model has the characteristics of multiple rounds of iteration and gradual denoising, so the generated painting images have a higher level.

然而，在对生成的绘画图像的内容控制方面，需要借助辅助神经网络模型整合到Diffusion模型中来实现。由于在生成绘画图像的过程中需要引入其他的辅助神经网络模型，这将增大显存以及图形处理器运行负载，提高了绘画图像的生成成本。However, in terms of content control of the generated painting images, it is necessary to integrate the auxiliary neural network model into the Diffusion model. Since other auxiliary neural network models need to be introduced in the process of generating painting images, this will increase the operating load of the video memory and graphics processor, and increase the cost of generating painting images.

发明内容Contents of the invention

本发明提供一种绘画图像生成方法、装置、电子设备及存储介质，实现了在无需额外引入其他的辅助神经网络模型的情况下，可以自动得到具有特征控制图像的图像特征的待生成绘画图像，从而可以减少显存以及图形处理器运行负载，降低了绘画图像的生成成本。The present invention provides a painting image generation method, device, electronic equipment and storage medium, which enables the automatic acquisition of a painting image to be generated with the image characteristics of a feature control image without the need to introduce other auxiliary neural network models. This can reduce the running load of video memory and graphics processor, and reduce the cost of generating drawing images.

本发明提供一种绘画图像生成方法，所述方法包括：获取文本信息以及特征控制图像，其中，所述文本信息为描述待生成绘画图像的图像画面的文本，所述待生成绘画图像具有所述特征控制图像的图像特征；基于所述文本信息，得到与所述文本信息对应的文本编码；基于所述文本编码，对扩散模型在多轮次迭代推理过程中各轮次推理得到的图像编码进行干预，得到各轮次推理的基础图像编码；基于所述特征控制图像，得到与所述特征控制图像对应的特征控制图像编码；基于所述特征控制图像编码，对目标轮次推理的所述基础图像编码进行干预直至推理过程结束，以得到所述待生成绘画图像。The present invention provides a painting image generation method. The method includes: obtaining text information and a feature control image, wherein the text information is text describing an image frame of the painting image to be generated, and the painting image to be generated has the The feature controls the image feature of the image; based on the text information, a text encoding corresponding to the text information is obtained; based on the text encoding, the image encoding obtained by the diffusion model in each round of iterative reasoning in the multi-round iterative reasoning process is performed Intervene to obtain the basic image coding of each round of reasoning; based on the feature control image, obtain the feature control image coding corresponding to the feature control image; based on the feature control image coding, the basis of the target round of reasoning Image coding intervenes until the end of the inference process to obtain the painting image to be generated.

根据本发明提供的一种绘画图像生成方法，所述基于所述特征控制图像编码，对目标轮次推理的所述基础图像编码进行干预直至推理过程结束，具体包括：在所述目标轮次推理在参考轮次推理之前的情况下，基于所述特征控制图像编码，按照第一预设替换率对所述基础图像编码进行全面替换并干预，直至推理过程结束，其中，所述第一预设替换率大于或等于替换率阈值；在所述参考轮次推理之前得到的所述基础图像编码的高斯噪声大于或等于噪声阈值。According to a painting image generation method provided by the present invention, the image encoding is controlled based on the characteristics, and the basic image encoding of the target round reasoning is intervened until the end of the reasoning process, specifically including: in the target round reasoning In the case before the reference round of reasoning, the image coding is controlled based on the characteristics, and the basic image coding is fully replaced and intervened according to a first preset replacement rate until the end of the reasoning process, wherein the first preset The replacement rate is greater than or equal to the replacement rate threshold; the Gaussian noise of the basic image encoding obtained before the reference round of inference is greater than or equal to the noise threshold.

根据本发明提供的一种绘画图像生成方法，所述基于所述特征控制图像编码，对目标轮次推理的所述基础图像编码进行干预直至推理过程结束，具体包括：在所述目标轮次推理在参考轮次推理之后的情况下，基于所述特征控制图像编码，按照第二预设替换率对与激活区域对应的所述基础图像编码进行局部替换并干预，直至推理过程结束，其中，所述第二预设替换率小于替换率阈值；所述激活区域为根据所述特征控制图像的图像特征确定的所述待生成绘画图像中需要进行干预的干预区域；在所述参考轮次推理之后得到的所述基础图像编码的高斯噪声小于噪声阈值。According to a painting image generation method provided by the present invention, the image encoding is controlled based on the characteristics, and the basic image encoding of the target round reasoning is intervened until the end of the reasoning process, specifically including: in the target round reasoning In the case after the reference round of inference, the image encoding is controlled based on the characteristics, and the base image encoding corresponding to the activation area is locally replaced and intervened according to the second preset replacement rate until the end of the inference process, where The second preset replacement rate is less than the replacement rate threshold; the activation area is an intervention area that requires intervention in the painting image to be generated determined according to the image characteristics of the feature control image; after the reference round inference The obtained Gaussian noise encoded by the basic image is less than the noise threshold.

根据本发明提供的一种绘画图像生成方法，所述特征控制图像包括两个或两个以上，对应的所述特征控制图像编码包括两个或两个以上；所述基于所述特征控制图像编码，对目标轮次推理的所述基础图像编码进行干预直至推理过程结束，具体包括：分别确定每一所述特征控制图像编码的替换权重值；按照所述替换权重值，基于所述特征控制图像编码对目标轮次推理的所述基础图像编码进行替换并干预直至推理过程结束。According to a painting image generation method provided by the present invention, the feature control images include two or more, and the corresponding feature control image codes include two or more; the feature control image codes based on the , intervening in the basic image coding of the target round reasoning until the end of the reasoning process, specifically including: determining the replacement weight value of each feature control image coding; according to the replacement weight value, based on the feature control image The encoding replaces the base image encoding of the target round of reasoning and intervenes until the end of the reasoning process.

根据本发明提供的一种绘画图像生成方法，所述基于所述特征控制图像编码对目标轮次推理的所述基础图像编码进行替换并干预直至推理过程结束，具体包括：将所述特征控制图像编码进行多轮次迭代加噪至所述目标轮次推理所对应的轮次，得到加噪后特征控制图像编码；基于所述加噪后特征控制图像编码，对目标轮次推理的所述基础图像编码进行替换并干预直至推理过程结束。According to a painting image generation method provided by the present invention, the base image coding of target round reasoning is replaced and intervened until the end of the reasoning process based on the feature control image coding, specifically including: converting the feature control image into Encoding is performed for multiple rounds iteratively adding noise to the round corresponding to the target round of reasoning to obtain the feature-controlled image coding after adding noise; based on the feature-controlled image coding after adding noise, the basis for the target round reasoning is obtained The image encoding is replaced and intervenes until the end of the inference process.

根据本发明提供的一种绘画图像生成方法，在所述扩散模型为由深度信息到图像的扩散模型的情况下，所述基于所述特征控制图像编码，对目标轮次推理的所述基础图像编码进行干预直至推理过程结束，具体包括：基于所述特征控制图像编码，对目标轮次推理的所述基础图像编码的目标通道进行干预直至推理过程结束，其中，所述目标通道为与深度信息对应通道之外的其他通道，所述深度信息为与所述基础图像编码对应的图像深度信息。According to a painting image generation method provided by the present invention, when the diffusion model is a diffusion model from depth information to an image, the image encoding is controlled based on the characteristics, and the basic image of the target round inference is Coding intervenes until the end of the inference process, specifically including: controlling image encoding based on the characteristics, intervening in the target channel of the basic image encoding of the target round inference until the end of the inference process, wherein the target channel is related to depth information For channels other than the corresponding channel, the depth information is image depth information corresponding to the basic image encoding.

根据本发明提供的一种绘画图像生成方法，所述特征控制图像包括具有预设深度图像特征的特征控制图像、具有预设边缘结构图像特征的特征控制图像以及具有预设位姿图像特征的特征控制图像中的一种或几种。According to a painting image generation method provided by the present invention, the feature control image includes a feature control image with preset depth image features, a feature control image with preset edge structure image features, and features with preset posture image features. Control one or more of the images.

本发明还提供一种绘画图像生成装置，所述装置包括：获取模块，用于获取文本信息以及特征控制图像，其中，所述文本信息为描述待生成绘画图像的图像画面的文本，所述待生成绘画图像具有所述特征控制图像的图像特征；文本编码模块，用于基于所述文本信息，得到与所述文本信息对应的文本编码；干预模块，用于基于所述文本编码，对扩散模型在多轮次迭代推理过程中各轮次推理得到的图像编码进行干预，得到各轮次推理的基础图像编码；图像编码模块，用于基于所述特征控制图像，得到与所述特征控制图像对应的特征控制图像编码；生成模块，用于基于所述特征控制图像编码，对目标轮次推理的所述基础图像编码进行干预直至推理过程结束，以得到所述待生成绘画图像。The present invention also provides a painting image generation device. The device includes: an acquisition module for acquiring text information and a feature control image, wherein the text information is text describing the image frame of the painting image to be generated, and the text information to be generated is Generate a painting image with the image characteristics of the feature control image; a text encoding module for obtaining a text encoding corresponding to the text information based on the text information; an intervention module for applying the diffusion model based on the text encoding In the multi-round iterative reasoning process, the image coding obtained by each round of reasoning is intervened to obtain the basic image coding of each round of reasoning; the image coding module is used to control the image based on the feature to obtain the image corresponding to the feature control The characteristics control the image coding; the generation module is used to control the image coding based on the characteristics, and intervene in the basic image coding of the target round of reasoning until the end of the reasoning process to obtain the painting image to be generated.

本发明还提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如上述任一种所述的绘画图像生成方法。The present invention also provides an electronic device, including a memory, a processor and a computer program stored in the memory and executable on the processor. When the processor executes the program, the painting image as described in any one of the above is realized. Generate method.

本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如上述任一种所述的绘画图像生成方法。The present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the method for generating a painting image as described above is implemented.

本发明还提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如上述任一种所述的绘画图像生成方法。The present invention also provides a computer program product, which includes a computer program. When the computer program is executed by a processor, the method for generating a painting image as described above is implemented.

本发明提供一种绘画图像生成方法、装置、电子设备及存储介质，获取文本信息以及特征控制图像，基于与文本信息对应的文本编码，对扩散模型在多轮次迭代推理过程中各轮次推理得到的图像编码进行干预，以得到各轮次推理的基础图像编码，再基于与特征控制图像对应的特征控制图像编码，对目标轮次推理的基础图像编码进行干预直至推理过程结束，从而可以在无需额外引入其他的辅助神经网络模型的情况下，能够自动得到具有特征控制图像的图像特征的待生成绘画图像，进而可以减少显存以及图形处理器运行负载，降低了绘画图像的生成成本。The present invention provides a painting image generation method, device, electronic equipment and storage medium, obtains text information and feature control images, and based on the text encoding corresponding to the text information, infers the diffusion model in each round of the multi-round iterative reasoning process. The obtained image coding is intervened to obtain the basic image coding of each round of reasoning, and then based on the feature control image coding corresponding to the feature control image, the basic image coding of the target round of reasoning is intervened until the end of the reasoning process, so that it can be Without the need to introduce other auxiliary neural network models, the painting image to be generated with the image characteristics of the characteristic control image can be automatically obtained, thereby reducing the graphics memory and graphics processor running load, and reducing the cost of generating painting images.

附图说明Description of the drawings

为了更清楚地说明本发明或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the present invention or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are of the present invention. For some embodiments of the invention, those of ordinary skill in the art can also obtain other drawings based on these drawings without exerting creative efforts.

图1是本发明提供的绘画图像生成方法的流程示意图；Figure 1 is a schematic flow chart of the painting image generation method provided by the present invention;

图2是本发明提供的基于特征控制图像编码，对目标轮次推理的基础图像编码进行干预直至推理过程结束的流程示意图；Figure 2 is a schematic flow chart of the feature-based control image coding provided by the present invention, which intervenes in the basic image coding of target round reasoning until the end of the reasoning process;

图3是本发明提供的基于特征控制图像编码，对目标轮次推理的所述基础图像编码进行替换并干预直至推理过程结束的流程示意图；Figure 3 is a schematic flowchart of the feature-based control image coding provided by the present invention, which replaces and intervenes in the basic image coding of the target round of reasoning until the end of the reasoning process;

图4是本发明提供的绘画图像生成方法的应用场景示意图；Figure 4 is a schematic diagram of the application scenario of the painting image generation method provided by the present invention;

图5是本发明提供的绘画图像生成装置的结构示意图；Figure 5 is a schematic structural diagram of the painting image generating device provided by the present invention;

图6是本发明提供的电子设备的结构示意图。Figure 6 is a schematic structural diagram of the electronic device provided by the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合本发明中的附图，对本发明中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the present invention more clear, the technical solutions in the present invention will be clearly and completely described below in conjunction with the accompanying drawings of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention. , not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts fall within the scope of protection of the present invention.

图1是本发明提供的绘画图像生成方法的流程示意图。Figure 1 is a schematic flowchart of the painting image generation method provided by the present invention.

为了进一步介绍本发明提供的绘画图像生成方法，下面将结合下述实施例进行说明。In order to further introduce the painting image generation method provided by the present invention, the following will be described in conjunction with the following embodiments.

在本发明一示例性实施例中，结合图1可知，绘画图像生成方法可以包括步骤110至步骤150，下面将分别介绍各步骤。In an exemplary embodiment of the present invention, as can be seen in conjunction with Figure 1, the painting image generation method may include steps 110 to 150. Each step will be introduced separately below.

在步骤110中，获取文本信息以及特征控制图像。In step 110, text information and feature control images are obtained.

其中，文本信息为描述待生成绘画图像的图像画面的文本；待生成绘画图像具有特征控制图像的图像特征。Among them, the text information is the text describing the image picture of the painting image to be generated; the painting image to be generated has the image characteristics of the characteristic control image.

可以理解的是，文本信息是用来描述待生成绘画图像的整体画像画面的文本信息，换句话说，待生成绘画图像可以是文本信息以图像的方式进行呈现的一种表现形式。It can be understood that the text information is text information used to describe the overall portrait of the painting image to be generated. In other words, the painting image to be generated can be a form of expression in which the text information is presented in the form of an image.

待生成绘画图像除了具备文本信息所描述的画面形象之外，其绘画图像还具有对应的图像特征，例如图像结构特征、图像颜色特征、图像中各个元素的位姿特征，或者图像的深度信息特征。在应用过程中，绘画图像的图像特征可以根据预先提供的特征控制图像所具有的图像特征来进行干预。可以理解的是，在基于特征控制图像进行干预的图像，也可以具有特征控制图像所具有的图像特征。In addition to the image described by the text information, the painting image to be generated also has corresponding image features, such as image structure features, image color features, pose features of each element in the image, or depth information features of the image. . During the application process, the image characteristics of the painting image can be intervened based on the image characteristics of the pre-provided feature control image. It can be understood that the image in which intervention is performed based on the feature control image may also have the image features of the feature control image.

在本发明又一示例性实施例中，特征控制图像可以包括具有预设深度图像特征的特征控制图像、具有预设边缘结构图像特征的特征控制图像以及具有预设位姿图像特征的特征控制图像中的一种或几种。In yet another exemplary embodiment of the present invention, the feature control image may include a feature control image with preset depth image features, a feature control image with preset edge structure image features, and a feature control image with preset pose image features. one or more of them.

在本实施例中，不对特征控制图像的具体内容作限定，其可以根据用户的实际需求进行确定。In this embodiment, the specific content of the feature control image is not limited, and it can be determined according to the actual needs of the user.

在步骤120中，基于文本信息，得到与文本信息对应的文本编码。In step 120, based on the text information, a text code corresponding to the text information is obtained.

在一种实施例中，可以通过语言模型，例如clip模型、T5模型等对文本信息进行编码表征，以得到与文本信息对应的文本编码。可以理解的是，在后续基于扩散模型进行多轮次迭代推理的过程中，文本编码将作为干预推理进程的一个因素。In one embodiment, the text information can be encoded and represented through a language model, such as a clip model, a T5 model, etc., to obtain a text encoding corresponding to the text information. It is understandable that in the subsequent multi-round iterative reasoning process based on the diffusion model, text encoding will serve as a factor that interferes with the reasoning process.

在步骤130中，基于文本编码，对扩散模型在多轮次迭代推理过程中各轮次推理得到的图像编码进行干预，得到各轮次推理的基础图像编码。In step 130, based on the text coding, the image coding obtained by the diffusion model in each round of inference in the multi-round iterative reasoning process is intervened to obtain the basic image coding of each round of inference.

需要说明的是，在扩散模型（Diffusion模型）进行多轮次迭代推理的过程中，会在每一轮次推理得到图像编码，可以理解的是，图像编码是Diffusion模型由噪声进行迭代得到。在应用过程中，可以基于文本编码对图像编码进行干预，以得到各轮次推理的基础图像编码。由于基础图像编码是基于文本编码干预得到，因此，得到的每一个基础图像编码可以认为是在基于扩散模型推理得到待生成绘画图像的中间态。It should be noted that during the multi-round iterative reasoning process of the diffusion model (Diffusion model), image coding will be obtained in each round of reasoning. It can be understood that the image coding is iteratively obtained by the Diffusion model from noise. During the application process, image coding can be intervened based on text coding to obtain the basic image coding for each round of reasoning. Since the basic image coding is obtained based on text coding intervention, each basic image coding obtained can be considered as an intermediate state in which the painting image to be generated is obtained based on diffusion model reasoning.

在又一实施例中，可以利用cross attention模型将文本编码与图像编码进行融合，来实现文本编码对图像编码的干预处理，进而可以得到基础图像编码。In another embodiment, the cross attention model can be used to fuse text coding and image coding to realize the intervention processing of text coding on image coding, and then obtain basic image coding.

在步骤140中，基于特征控制图像，得到与特征控制图像对应的特征控制图像编码。In step 140, based on the feature control image, a feature control image code corresponding to the feature control image is obtained.

在步骤150中，基于特征控制图像编码，对目标轮次推理的基础图像编码进行干预直至推理过程结束，以得到待生成绘画图像。In step 150, the image coding is controlled based on the features, and the basic image coding of the target round of reasoning is intervened until the end of the reasoning process to obtain the painting image to be generated.

在一种实施例中，可以对特征控制图像进行提取处理，得到与特征控制图像对应的特征控制图像编码。进一步的，还可以结合特征控制图像编码，对各轮次推理过程中得到的基础图像编码进行干预处理，并一直迭代推理直至推理过程结束。In one embodiment, the feature control image can be extracted and processed to obtain a feature control image code corresponding to the feature control image. Furthermore, the image coding can be controlled by combining features to intervene in the basic image coding obtained during each round of reasoning, and the reasoning can be iterated until the end of the reasoning process.

在又一示例中，还可以基于特征控制图像编码，由目标轮次推理得到的基础图像编码开始进行干预处理，并一直干预迭代推理直至推理过程结束。其中，目标轮次推理可以根据实际情况进行确定，在一示例中，以Diffusion模型的迭代推理过程为50轮次为例进行说明，目标轮次推理的过程可以是50轮次推理中的任意一次。当目标轮次为第1轮次推理过程时，将会基于特征控制图像编码由第一轮次推理的基础图像编码开始就进行干预，并一直干预迭代推理直至50轮次推理过程结束，从而可以得到具有特征控制图像的图像特征的待生成绘画图像。通过本实施例，可以在无需额外引入其他的辅助神经网络模型的情况下，能够自动得到具有特征控制图像的图像特征的待生成绘画图像，进而可以减少显存以及图形处理器运行负载，降低了绘画图像的生成成本。In yet another example, image encoding can also be controlled based on features, and intervention processing begins with the basic image encoding obtained from the target round of inference, and continues to intervene in the iterative inference until the end of the inference process. Among them, the target round of reasoning can be determined according to the actual situation. In an example, taking the iterative reasoning process of the Diffusion model as 50 rounds as an example, the target round of reasoning can be any one of the 50 rounds of reasoning. . When the target round is the first round of reasoning process, the feature-based control image coding will be intervened starting from the basic image coding of the first round of reasoning, and will continue to intervene in the iterative reasoning until the end of the 50th round of reasoning process, so that it can Obtain the painting image to be generated with the image characteristics of the characteristic control image. Through this embodiment, the painting image to be generated with the image characteristics of the characteristic control image can be automatically obtained without the need to introduce other auxiliary neural network models, thereby reducing the running load of the video memory and the graphics processor, and reducing the cost of painting. Image generation cost.

需要说明的是，当前通过引入辅助神经网络模型得到具有特征控制图像的图像特征的待生成绘画图像的过程中，还需要额外训练独立的辅助神经网络模型耗时耗力；另外，在插入与辅助神经网络模型对应的辅助模块插件的过程中，还会存在与Diffusion模型不适配的情况。然而，本发明提供的绘画图像生成方法，通过对在Diffusion模型进行迭代推理处理过程中得到的中间态，例如基础图像编码进行干预直至推理过程结束，从而可以在无需额外引入其他的辅助神经网络模型的情况下，能够自动得到具有特征控制图像的图像特征的待生成绘画图像，进而避免了额外训练独立的辅助神经网络模型耗时耗力的问题，以及与辅助神经网络模型对应的辅助模块插件和Diffusion模型不适配的问题，从而可以使得辅助模块插件和Diffusion模型解耦。其中，辅助模块可以理解为是能够将待生成绘画图像干预为具有特征控制图像的图像特征的处理模块。It should be noted that in the current process of introducing an auxiliary neural network model to obtain the painting image to be generated with the image characteristics of the characteristic control image, it is also necessary to additionally train an independent auxiliary neural network model, which is time-consuming and labor-intensive; in addition, in the insertion and auxiliary In the process of plug-in the auxiliary module corresponding to the neural network model, there will be situations where it is not suitable for the Diffusion model. However, the painting image generation method provided by the present invention intervenes in the intermediate state, such as basic image coding, obtained during the iterative reasoning process of the Diffusion model until the end of the reasoning process, thereby eliminating the need to introduce other auxiliary neural network models. In this case, the painting image to be generated with the image characteristics of the feature control image can be automatically obtained, thereby avoiding the time-consuming and labor-intensive problem of additional training of an independent auxiliary neural network model, as well as the auxiliary module plug-ins and corresponding auxiliary neural network models. The problem of Diffusion model mismatch can decouple the auxiliary module plug-in from the Diffusion model. Among them, the auxiliary module can be understood as a processing module that can intervene in the painting image to be generated into image features with characteristic control images.

其中，特征控制图像可以根据用户的需求进行调整，另外，目标轮次也可以根据用户需求进行确定，从而可以自动控制在绘画图像生成过程中，能够根据用户需求个性化的基于特征控制图像对基础图像编码的干预过程，以得到满足用户需求的待生成绘画图像。Among them, the feature control image can be adjusted according to the user's needs. In addition, the target round can also be determined according to the user's needs, so that the process of generating the painting image can be automatically controlled and the feature-based control image can be personalized according to the user's needs. The intervention process of image coding to obtain the painting image to be generated that meets the user's needs.

本发明提供的绘画图像生成方法，获取文本信息以及特征控制图像，基于与文本信息对应的文本编码，对扩散模型在多轮次迭代推理过程中各轮次推理得到的图像编码进行干预，以得到各轮次推理的基础图像编码，再基于与特征控制图像对应的特征控制图像编码，对目标轮次推理的基础图像编码进行干预直至推理过程结束，从而可以在无需额外引入其他的辅助神经网络模型的情况下，能够自动得到具有特征控制图像的图像特征的待生成绘画图像，进而可以减少显存以及图形处理器运行负载，降低了绘画图像的生成成本。The painting image generation method provided by the present invention obtains text information and feature control images, and based on the text coding corresponding to the text information, intervenes in the image coding obtained by the diffusion model in each round of iterative reasoning in the multi-round iterative reasoning process to obtain The basic image coding of each round of reasoning, and then based on the feature control image coding corresponding to the feature control image, intervenes in the basic image coding of the target round of reasoning until the end of the reasoning process, so that there is no need to introduce other auxiliary neural network models. In this case, the painting image to be generated with the image characteristics of the characteristic control image can be automatically obtained, thereby reducing the running load of the video memory and the graphics processor, and reducing the cost of generating the painting image.

图2是本发明提供的基于特征控制图像编码，对目标轮次推理的基础图像编码进行干预直至推理过程结束的流程示意图。Figure 2 is a schematic flowchart of the feature-based control image coding provided by the present invention, intervening in the basic image coding of the target round of reasoning until the end of the reasoning process.

为了进一步介绍本发明提供的绘画图像生成方法，下面将结合图2进行说明。In order to further introduce the painting image generation method provided by the present invention, the following will be described in conjunction with Figure 2.

在本发明一示例性实施例中，结合图2可知，基于特征控制图像编码，对目标轮次推理的基础图像编码进行干预直至推理过程结束可以包括步骤210和步骤220，下面将分别介绍各步骤。In an exemplary embodiment of the present invention, it can be seen from Figure 2 that controlling image coding based on features and intervening in the basic image coding of target round reasoning until the end of the reasoning process may include steps 210 and 220. Each step will be introduced below. .

在步骤210中，在目标轮次推理在参考轮次推理之前的情况下，基于特征控制图像编码，按照第一预设替换率对基础图像编码进行全面替换并干预，直至推理过程结束。In step 210, when the target round of inference is before the reference round of inference, the image encoding is controlled based on the features, and the basic image encoding is fully replaced and intervened according to the first preset replacement rate until the end of the inference process.

在步骤220中，在目标轮次推理在参考轮次推理之后的情况下，基于特征控制图像编码，按照第二预设替换率对与激活区域对应的基础图像编码进行局部替换并干预，直至推理过程结束。In step 220, when the target round inference is after the reference round inference, the image encoding is controlled based on the features, and the basic image encoding corresponding to the activation area is locally replaced and intervened according to the second preset replacement rate until inference. The process ends.

其中，第一预设替换率大于或等于替换率阈值；第二预设替换率小于替换率阈值；激活区域为根据特征控制图像的图像特征确定的待生成绘画图像中需要进行干预的干预区域；在参考轮次推理之前得到的基础图像编码的高斯噪声大于或等于噪声阈值；在参考轮次推理之后得到的基础图像编码的高斯噪声小于噪声阈值。Among them, the first preset replacement rate is greater than or equal to the replacement rate threshold; the second preset replacement rate is less than the replacement rate threshold; the activation area is the intervention area in the painting image to be generated that needs intervention determined according to the image characteristics of the feature control image; The Gaussian noise of the base image encoding obtained before the reference round of inference is greater than or equal to the noise threshold; the Gaussian noise of the base image encoding obtained after the reference round of inference is less than the noise threshold.

在一种实施例中，可以采用DDIM算法作为加快推理和减少迭代轮次的优化方法。继续以Diffusion模型的迭代推理过程为50轮次为例进行说明。在50轮次的迭代推理中，前25轮次迭代推理过程中的高斯噪声明显，几乎看不出实际内容；后25轮次迭代推理过程中内容逐渐清晰化。因此，在前25轮次进行深度中间态干预，每轮次在权重上设置为0.7的全面替换率。后25轮次进行浅层次干预，只更新激活区域的替换，同时设置替换率为0.4。因此，基于本发明提供的绘画图像生成方法，对于新的辅助模块介入，也无需额外训练独立辅助网络进行控制生成。In one embodiment, the DDIM algorithm can be used as an optimization method to speed up inference and reduce iteration rounds. Let’s continue to take the iterative reasoning process of the Diffusion model as 50 rounds as an example. In the 50 rounds of iterative reasoning, the Gaussian noise in the first 25 rounds of iterative reasoning was obvious, and the actual content was almost invisible; the content gradually became clear in the last 25 rounds of iterative reasoning. Therefore, deep intermediate state intervention is carried out in the first 25 rounds, and each round is set to a comprehensive replacement rate of 0.7 in weight. In the last 25 rounds, shallow intervention is performed, only the replacement of the active area is updated, and the replacement rate is set to 0.4. Therefore, based on the painting image generation method provided by the present invention, there is no need to additionally train an independent auxiliary network to control the generation of new auxiliary modules.

在又一实施例中，参考轮次可以理解为前述的第25轮次。其中，在参考轮次推理之前得到的基础图像编码的高斯噪声大于或等于噪声阈值，因此，在目标轮次推理在参考轮次推理之前的情况下，在对目标轮次推理的基础图像编码进行干预的过程中，可以按照第一预设替换率对基础图像编码进行全面替换并干预，直至迭代推理过程结束。其中，第一预设替换率可以对应前文所述的0.7。在本实施例中，不对第一预设替换率作具体限定。需要说明的是，对于迭代推理总轮次为50轮次的情况下，第一预设替换率可以设置在0.5以上，第二预设替换率可以设置在0.5以下。In yet another embodiment, the reference round may be understood as the aforementioned 25th round. Among them, the Gaussian noise of the basic image encoding obtained before the reference round inference is greater than or equal to the noise threshold. Therefore, in the case where the target round inference is before the reference round inference, the basic image encoding of the target round inference is performed. During the intervention process, the basic image coding can be completely replaced and intervened according to the first preset replacement rate until the iterative reasoning process ends. The first preset replacement rate may correspond to the aforementioned 0.7. In this embodiment, the first preset replacement rate is not specifically limited. It should be noted that when the total number of rounds of iterative reasoning is 50, the first preset replacement rate can be set above 0.5, and the second preset replacement rate can be set below 0.5.

在又一实施例中，在参考轮次推理之后得到的基础图像编码的高斯噪声小于噪声阈值，因此，在目标轮次推理在参考轮次推理之后的情况下，在对目标轮次推理的基础图像编码进行干预的过程中，可以按照第二预设替换率对与激活区域对应的基础图像编码进行局部替换并干预，直至迭代推理过程结束。其中，第二预设替换率可以对应前文所述的0.4。在本实施例中，不对第二预设替换率作具体限定。In yet another embodiment, the Gaussian noise of the base image encoding obtained after the reference round of inference is less than the noise threshold. Therefore, in the case where the target round of inference is after the reference round of inference, the basis of the target round of inference is During the intervention process of image coding, the basic image coding corresponding to the activation area can be partially replaced and intervened according to the second preset replacement rate until the iterative reasoning process ends. The second preset replacement rate may correspond to the aforementioned 0.4. In this embodiment, the second preset replacement rate is not specifically limited.

其中，激活区域可以理解为是，若根据特征控制图像的图像特征确定需要对待生成绘画图像的背景进行干预，那么激活区域就可以立即为是与待生成绘画图像的背景所对应的区域。The activation area can be understood as: if it is determined based on the image characteristics of the feature control image that intervention is required on the background of the painting image to be generated, then the activation area can immediately be the area corresponding to the background of the painting image to be generated.

图3是本发明提供的基于特征控制图像编码，对目标轮次推理的所述基础图像编码进行替换并干预直至推理过程结束的流程示意图。Figure 3 is a schematic flowchart of the feature-based control image coding provided by the present invention, which replaces and intervenes in the basic image coding of the target round of reasoning until the end of the reasoning process.

下面将结合图3所示的实施例进行说明。The following will be described with reference to the embodiment shown in FIG. 3 .

在本发明一示例性实施例中，特征控制图像可以包括两个或两个以上，对应的特征控制图像编码也包括两个或两个以上。结合图3可知，基于特征控制图像编码，对目标轮次推理的基础图像编码进行干预直至推理过程结束可以包括步骤310和步骤320，下面将分别介绍各步骤。In an exemplary embodiment of the present invention, the feature control images may include two or more, and the corresponding feature control image codes may also include two or more. It can be seen from Figure 3 that controlling image coding based on features and intervening in the basic image coding of target round reasoning until the end of the reasoning process may include steps 310 and 320. Each step will be introduced below.

在步骤310中，分别确定每一特征控制图像编码的替换权重值；In step 310, the replacement weight value of each feature control image encoding is determined respectively;

在步骤320中，按照替换权重值，基于特征控制图像编码对目标轮次推理的基础图像编码进行替换并干预直至推理过程结束。In step 320, according to the replacement weight value, the basic image coding of the target round inference is replaced based on the feature control image coding and intervenes until the end of the reasoning process.

在一种实施例中，在特征控制图像编码包括两个或两个以上的情况下，若在基础图像编码所对应图像的相应位置直接加和替换会导致数值过高，在应用过程中，可以分别确定每一特征控制图像编码的替换权重值；并按照替换权重值，基于特征控制图像编码对目标轮次推理的基础图像编码进行替换并干预直至推理过程结束。In one embodiment, when the feature control image codes include two or more, direct addition and replacement at the corresponding positions of the images corresponding to the basic image codes will result in too high values. During the application process, you can The replacement weight value of each feature control image coding is determined respectively; and according to the replacement weight value, the basic image coding of the target round of reasoning is replaced and intervened until the end of the reasoning process based on the feature control image coding.

在又一实施例中，还可以按照替换权重值，基于特征控制图像编码对目标轮次推理的基础图像编码进行替换并干预直至推理过程结束，其中，基础图像编码可以理解为是与激活区域对应的基础图像编码。In yet another embodiment, the basic image coding of the target round reasoning can be replaced and intervened until the end of the reasoning process based on the feature control image coding according to the replacement weight value, where the basic image coding can be understood as corresponding to the activation area. basic image coding.

在本发明又一示例性实施例中，基于特征控制图像编码，对目标轮次推理的基础图像编码进行替换并干预直至推理过程结束，还可以采用以下方式实现：In yet another exemplary embodiment of the present invention, image coding is controlled based on features, and the basic image coding of target round reasoning is replaced and intervened until the end of the reasoning process. This can also be implemented in the following ways:

将特征控制图像编码进行多轮次迭代加噪至目标轮次推理所对应的轮次，得到加噪后特征控制图像编码；Perform multiple rounds of iterative noise on the feature control image coding to the round corresponding to the target round of reasoning, and obtain the feature control image coding after noise addition;

基于加噪后特征控制图像编码，对目标轮次推理的基础图像编码进行替换并干预直至推理过程结束。The image coding is controlled based on the features after adding noise, and the basic image coding of the target round reasoning is replaced and intervened until the end of the reasoning process.

在一种实施例中，可以将与特征控制图像对应的特征控制图像编码进行多轮次迭代加噪至目标轮次推理所对应的轮次，可以理解为反推理的过程，以得到加噪候特征控制图像编码。进一步的，可以基于加噪后特征控制图像编码，对目标轮次推理的基础图像编码进行替换并干预直至迭代推理过程结束。在本实施例中，是将特征控制图像编码进行加噪处理后才对基础图像编码进行替换并干预，可以确保特征控制图像编码与基础图像编码在迭代推理过程中推理轮次的对应性，从而可以在目标轮次推理之后的每一轮次的干预过程中，特征控制图像编码与基础图像编码都是轮次对应，进而可以更好得实现对基础图像编码进行替换并干预。In one embodiment, the feature control image encoding corresponding to the feature control image can be iteratively denoised for multiple rounds to the round corresponding to the target round of inference, which can be understood as a process of reverse reasoning to obtain the noise addition signal. Features control image encoding. Furthermore, the image coding can be controlled based on the noise-added features, and the basic image coding of the target round of reasoning can be replaced and intervened until the end of the iterative reasoning process. In this embodiment, the feature control image coding is denoised before the basic image coding is replaced and intervened. This can ensure the correspondence between the feature control image coding and the basic image coding in the inference rounds during the iterative reasoning process, thus In the intervention process of each round after the target round of reasoning, the feature control image coding and the basic image coding are round-corresponding, so that the replacement and intervention of the basic image coding can be better realized.

图4是本发明提供的绘画图像生成方法的应用场景示意图。Figure 4 is a schematic diagram of an application scenario of the painting image generation method provided by the present invention.

在了进一步介绍前述的基于特征控制图像编码，对目标轮次推理的基础图像编码进行替换并干预直至推理过程结束的过程，下面将结合图4进行说明。In order to further introduce the aforementioned feature-based control image coding, the process of replacing the basic image coding of the target round reasoning and intervening until the end of the reasoning process will be explained below with reference to Figure 4.

在一种实施例中，结合图4可知，由X_T到X₀可以理解为是基于Diffusion模型的迭代推理过程中的50个推理轮次。其中，X_T为Diffusion模型在起始推理时随机噪声所对应的图像编码。X₀可以理解为是与特征控制图像对应的特征控制图像编码。当目标轮次推理为图4所对应的t轮次时，可以将特征控制图像编码进行多轮次迭代加噪至目标轮次推理所对应的轮次（对应t轮次），得到加噪后特征控制图像编码，其中，加噪后特征控制图像编码可以对应图4中的q（x_t| x_t-1），基础图像编码可以对应图4中的p（x_t-1| x_t）。在应用过程中，当特征控制图像编码由t轮次进行干预时，也要通过反向加噪的方式将特征控制图像编码处理至t轮次。进一步在在T到0轮次中，即每一个中间态过程的t轮次到t-1轮次，对于不同的特征控制图像编码，可以设置不同的替换权重值，并按照替换权重值进行干预替换。In one embodiment, it can be seen in conjunction with Figure 4 that from X _T to X ₀ can be understood as 50 rounds of reasoning in the iterative reasoning process based on the Diffusion model. Among them, X _T is the image encoding corresponding to the random noise when the Diffusion model starts inference. X ₀ can be understood as the feature control image encoding corresponding to the feature control image. When the target round of reasoning is round t corresponding to Figure 4, the feature control image encoding can be iteratively denoised for multiple rounds to the round corresponding to the target round of reasoning (corresponding to round t), and the result after adding noise is Feature-controlled image coding, in which the feature-controlled image coding after adding noise can correspond to q (x _t | x _t-1 ) in Figure 4, and the basic image coding can correspond to p (x _t-1 | x _t ) in Figure 4 . During the application process, when the feature control image encoding is intervened in round t, the feature control image coding must also be processed to round t through reverse noise addition. Further, in rounds T to 0, that is, rounds t to t-1 of each intermediate state process, different replacement weight values can be set for different feature control image encodings, and intervention can be carried out according to the replacement weight values. replace.

在一示例中，在特征控制图像编码为与canny对应的特征控制图像编码，以及与pose对应的特征控制图像编码的情况下，可以将与canny对应的特征控制图像编码的替换权重值设置为0.6，将与pose对应的特征控制图像编码的替换权重值设置为0.4。In an example, when the feature control image coding is the feature control image coding corresponding to canny, and the feature control image coding corresponding to pose, the replacement weight value of the feature control image coding corresponding to canny can be set to 0.6 , set the replacement weight value of the feature control image encoding corresponding to pose to 0.4.

在本发明又一示例性实施例中，在扩散模型为由深度信息到图像的扩散模型（对应depth to Image生成方式）的情况下，继续以图1所述的实施例为例进行说明，基于特征控制图像编码，对目标轮次推理的基础图像编码进行干预直至推理过程结束（对应步骤150）可以采用以下方式实现：In another exemplary embodiment of the present invention, in the case where the diffusion model is a diffusion model from depth information to image (corresponding to the depth to Image generation method), the description will continue to take the embodiment described in Figure 1 as an example. Based on Features control image coding, and intervention in the basic image coding of target round reasoning until the end of the reasoning process (corresponding to step 150) can be implemented in the following ways:

基于特征控制图像编码，对目标轮次推理的基础图像编码的目标通道进行干预直至推理过程结束，其中，目标通道为与深度信息对应通道之外的其他通道，深度信息为与基础图像编码对应的图像深度信息。Control image coding based on features, and intervene in the target channel of the basic image coding of the target round inference until the end of the inference process. The target channel is a channel other than the channel corresponding to the depth information, and the depth information is a channel corresponding to the basic image coding. Image depth information.

在一种实施例中，在depth to Image生成方式的情况下，在unet框架下，可以只对前四通道做干预推理处理，第五通道深度信息不变，从而可以起到双面控制的效果。其中，前四通道可以对应前文所述的目标通道；第五通道可以对应前文所述的深度信息对应通道。In one embodiment, in the case of the depth to image generation method, under the unet framework, only the first four channels can be subjected to intervention inference processing, and the depth information of the fifth channel remains unchanged, thus achieving a double-sided control effect. . Among them, the first four channels can correspond to the target channels mentioned above; the fifth channel can correspond to the depth information corresponding channels mentioned above.

本发明提供的绘画图像生成方法，为了对生成待生成绘画图像的生成成本进行控制，提出了利用Diffusion模型的sampling迭代（对应迭代推理）中，对中间态（对应基础图像编码）插入辅助模块进行干预的控制方法，其主要效果有：The painting image generation method provided by the present invention, in order to control the generation cost of generating the painting image to be generated, proposes to use the sampling iteration (corresponding to iterative reasoning) of the Diffusion model to insert an auxiliary module into the intermediate state (corresponding to the basic image encoding). The main effects of intervention control methods are:

1、无额外训练网络：无需训练额外网络的方式避免了推理阶段显存增大，也解决了多模块扩展需要增大显存的问题，节约部署成本；1. No additional training network: No need to train additional networks avoids the increase of video memory during the inference phase, and also solves the problem of increasing video memory for multi-module expansion, saving deployment costs;

2、支持新辅助模块扩展：sampling迭代中插入辅助模块进行干预的方式，不仅避免了对辅助模块需要进行独立网络训练，而且有利于新辅助模块扩展；2. Support the expansion of new auxiliary modules: The way of inserting auxiliary modules for intervention during sampling iteration not only avoids the need for independent network training of auxiliary modules, but also facilitates the expansion of new auxiliary modules;

3、支持多辅助模块组合控制：利用不同的权重控制不同辅助模块的介入，在相应的激活区域对图像生成进行干预；3. Support multi-auxiliary module combination control: use different weights to control the intervention of different auxiliary modules, and intervene in image generation in the corresponding activation area;

4、辅助模块与Diffusion基础模型解耦合。有利于将辅助模块运用于各种不同的Diffusion基础模型，同时不训练方式的中间态干涉控制方法与各种基础模型的组合进一步为各垂类扩展提供了便捷方式。4. The auxiliary module is decoupled from the Diffusion basic model. It is beneficial to apply auxiliary modules to various different Diffusion basic models. At the same time, the combination of the intermediate-state interference control method without training and various basic models further provides a convenient way for the expansion of various vertical categories.

根据上述描述可知，本发明提供一种绘画图像生成方法，获取文本信息以及特征控制图像，基于与文本信息对应的文本编码，对扩散模型在多轮次迭代推理过程中各轮次推理得到的图像编码进行干预，以得到各轮次推理的基础图像编码，再基于与特征控制图像对应的特征控制图像编码，对目标轮次推理的基础图像编码进行干预直至推理过程结束，从而可以在无需额外引入其他的辅助神经网络模型的情况下，能够自动得到具有特征控制图像的图像特征的待生成绘画图像，进而可以减少显存以及图形处理器运行负载，降低了绘画图像的生成成本。According to the above description, the present invention provides a painting image generation method, which obtains text information and feature control images, and based on the text encoding corresponding to the text information, performs the multi-round iterative reasoning process on the images obtained by each round of the diffusion model. Coding is intervened to obtain the basic image coding of each round of reasoning, and then based on the feature control image coding corresponding to the feature control image, the basic image coding of the target round of reasoning is intervened until the end of the reasoning process, so that no additional introduction is needed In the case of other auxiliary neural network models, the painting image to be generated with the image characteristics of the characteristic control image can be automatically obtained, thereby reducing the graphics memory and graphics processor running load, and reducing the cost of generating painting images.

基于相同的构思，本发明还提供一种绘画图像生成装置。Based on the same concept, the present invention also provides a painting image generating device.

下面对本发明提供的绘画图像生成装置进行描述，下文描述的绘画图像生成装置与上文描述的绘画图像生成方法可相互对应参照。The painting image generating device provided by the present invention will be described below. The painting image generating device described below and the painting image generating method described above may be referenced correspondingly.

图5是本发明提供的绘画图像生成装置的结构示意图。Figure 5 is a schematic structural diagram of the painting image generating device provided by the present invention.

在本发明一示例性实施例中，结合图5可知，绘画图像生成装置可以包括获取模块510、文本编码模块520、干预模块530、图像编码模块540和生成模块550，下面将分别介绍各模块。In an exemplary embodiment of the present invention, as can be seen in conjunction with Figure 5, the painting image generation device may include an acquisition module 510, a text encoding module 520, an intervention module 530, an image encoding module 540 and a generation module 550. Each module will be introduced separately below.

获取模块510，可以被配置为用于获取文本信息以及特征控制图像，其中，文本信息为描述待生成绘画图像的图像画面的文本，待生成绘画图像具有特征控制图像的图像特征；The acquisition module 510 may be configured to acquire text information and a feature control image, where the text information is text describing the image frame of the painting image to be generated, and the painting image to be generated has the image features of the feature control image;

文本编码模块520，可以被配置为用于基于文本信息，得到与文本信息对应的文本编码；The text encoding module 520 may be configured to obtain text encoding corresponding to the text information based on the text information;

干预模块530，可以被配置为用于基于文本编码，对扩散模型在多轮次迭代推理过程中各轮次推理得到的图像编码进行干预，得到各轮次推理的基础图像编码；The intervention module 530 may be configured to intervene, based on text coding, on the image coding obtained by the diffusion model in each round of inference in the multi-round iterative reasoning process to obtain the basic image coding of each round of inference;

图像编码模块540，可以被配置为用于基于特征控制图像，得到与特征控制图像对应的特征控制图像编码；The image encoding module 540 may be configured to obtain a characteristic control image encoding corresponding to the characteristic control image based on the characteristic control image;

生成模块550，可以被配置为用于基于特征控制图像编码，对目标轮次推理的基础图像编码进行干预直至推理过程结束，以得到待生成绘画图像。The generation module 550 may be configured to control image encoding based on features, and intervene in the basic image encoding of target round inference until the end of the inference process to obtain the painting image to be generated.

在本发明一示例性实施例中，生成模块550可以采用以下方式实现基于特征控制图像编码，对目标轮次推理的基础图像编码进行干预直至推理过程结束：In an exemplary embodiment of the present invention, the generation module 550 can implement feature-based control image coding in the following manner, and intervene in the basic image coding of the target round of reasoning until the end of the reasoning process:

在目标轮次推理在参考轮次推理之前的情况下，基于特征控制图像编码，按照第一预设替换率对基础图像编码进行全面替换并干预，直至推理过程结束，其中，第一预设替换率大于或等于替换率阈值；在参考轮次推理之前得到的基础图像编码的高斯噪声大于或等于噪声阈值。When the target round of reasoning is before the reference round of reasoning, the image encoding is controlled based on the features, and the basic image encoding is fully replaced and intervened according to the first preset replacement rate until the end of the reasoning process, where the first preset replacement The rate is greater than or equal to the replacement rate threshold; the Gaussian noise of the base image encoding obtained before the reference round of inference is greater than or equal to the noise threshold.

在目标轮次推理在参考轮次推理之后的情况下，基于特征控制图像编码，按照第二预设替换率对与激活区域对应的基础图像编码进行局部替换并干预，直至推理过程结束，其中，第二预设替换率小于替换率阈值；激活区域为根据特征控制图像的图像特征确定的待生成绘画图像中需要进行干预的干预区域；在参考轮次推理之后得到的基础图像编码的高斯噪声小于噪声阈值。In the case where the target round of reasoning is after the reference round of reasoning, the image encoding is controlled based on the features, and the basic image encoding corresponding to the activation area is locally replaced and intervened according to the second preset replacement rate until the end of the inference process, where, The second preset replacement rate is less than the replacement rate threshold; the activation area is the intervention area that needs to be intervened in the painting image to be generated determined according to the image characteristics of the feature control image; the Gaussian noise of the basic image encoding obtained after the reference round of inference is less than Noise threshold.

在本发明一示例性实施例中，特征控制图像可以包括两个或两个以上，对应的特征控制图像编码可以包括两个或两个以上；In an exemplary embodiment of the present invention, the feature control image may include two or more, and the corresponding feature control image encoding may include two or more;

生成模块550可以采用以下方式实现基于特征控制图像编码，对目标轮次推理的基础图像编码进行干预直至推理过程结束：The generation module 550 can implement feature-based control of image coding in the following manner, and intervene in the basic image coding of the target round of reasoning until the end of the reasoning process:

分别确定每一特征控制图像编码的替换权重值；Determine the replacement weight value of each feature control image encoding separately;

按照替换权重值，基于特征控制图像编码对目标轮次推理的基础图像编码进行替换并干预直至推理过程结束。According to the replacement weight value, the basic image coding of the target round reasoning is replaced and intervened until the end of the reasoning process based on the feature control image coding.

在本发明一示例性实施例中，生成模块550可以采用以下方式实现基于特征控制图像编码对目标轮次推理的基础图像编码进行替换并干预直至推理过程结束：In an exemplary embodiment of the present invention, the generation module 550 can use the following method to replace the basic image coding of the target round of reasoning based on feature control image coding and intervene until the end of the reasoning process:

在本发明一示例性实施例中，在扩散模型为由深度信息到图像的扩散模型的情况下，生成模块550可以采用以下方式实现基于特征控制图像编码，对目标轮次推理的基础图像编码进行干预直至推理过程结束：In an exemplary embodiment of the present invention, when the diffusion model is a diffusion model from depth information to images, the generation module 550 can implement feature-based control image coding in the following manner, and perform basic image coding for target round reasoning. Intervene until the end of the reasoning process:

在本发明一示例性实施例中，特征控制图像可以包括具有预设深度图像特征的特征控制图像、具有预设边缘结构图像特征的特征控制图像以及具有预设位姿图像特征的特征控制图像中的一种或几种。In an exemplary embodiment of the present invention, the feature control image may include a feature control image with preset depth image features, a feature control image with preset edge structure image features, and a feature control image with preset pose image features. one or several kinds.

图6示例了一种电子设备的实体结构示意图，如图6所示，该电子设备可以包括：处理器(processor)610、通信接口(Communications Interface)620、存储器(memory)630和通信总线640，其中，处理器610，通信接口620，存储器630通过通信总线640完成相互间的通信。处理器610可以调用存储器630中的逻辑指令，以执行绘画图像生成方法，该方法包括：获取文本信息以及特征控制图像，其中，所述文本信息为描述待生成绘画图像的图像画面的文本，所述待生成绘画图像具有所述特征控制图像的图像特征；基于所述文本信息，得到与所述文本信息对应的文本编码；基于所述文本编码，对扩散模型在多轮次迭代推理过程中各轮次推理得到的图像编码进行干预，得到各轮次推理的基础图像编码；基于所述特征控制图像，得到与所述特征控制图像对应的特征控制图像编码；基于所述特征控制图像编码，对目标轮次推理的所述基础图像编码进行干预直至推理过程结束，以得到所述待生成绘画图像。Figure 6 illustrates a schematic diagram of the physical structure of an electronic device. As shown in Figure 6, the electronic device may include: a processor (processor) 610, a communications interface (Communications Interface) 620, a memory (memory) 630 and a communication bus 640. Among them, the processor 610, the communication interface 620, and the memory 630 complete communication with each other through the communication bus 640. The processor 610 can call logical instructions in the memory 630 to execute a painting image generation method, which method includes: obtaining text information and a feature control image, where the text information is text describing an image frame of the painting image to be generated, so The painting image to be generated has the image characteristics of the characteristic control image; based on the text information, a text code corresponding to the text information is obtained; based on the text code, each diffusion model is used in a multi-round iterative reasoning process. Intervene with the image coding obtained in each round of reasoning to obtain the basic image coding of each round of reasoning; based on the feature control image, obtain the feature control image coding corresponding to the feature control image; based on the feature control image coding, The basic image encoding of target round reasoning intervenes until the end of the reasoning process to obtain the painting image to be generated.

此外，上述的存储器630中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器（ROM，Read-Only Memory）、随机存取存储器（RAM，Random Access Memory）、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logical instructions in the memory 630 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .

另一方面，本发明还提供一种计算机程序产品，所述计算机程序产品包括计算机程序，计算机程序可存储在非暂态计算机可读存储介质上，所述计算机程序被处理器执行时，计算机能够执行上述各方法所提供的绘画图像生成方法，该方法包括：获取文本信息以及特征控制图像，其中，所述文本信息为描述待生成绘画图像的图像画面的文本，所述待生成绘画图像具有所述特征控制图像的图像特征；基于所述文本信息，得到与所述文本信息对应的文本编码；基于所述文本编码，对扩散模型在多轮次迭代推理过程中各轮次推理得到的图像编码进行干预，得到各轮次推理的基础图像编码；基于所述特征控制图像，得到与所述特征控制图像对应的特征控制图像编码；基于所述特征控制图像编码，对目标轮次推理的所述基础图像编码进行干预直至推理过程结束，以得到所述待生成绘画图像。On the other hand, the present invention also provides a computer program product. The computer program product includes a computer program. The computer program can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can Executing the painting image generation method provided by the above methods, the method includes: obtaining text information and a feature control image, wherein the text information is text describing the image frame of the painting image to be generated, and the painting image to be generated has the The above characteristics control the image characteristics of the image; based on the text information, a text encoding corresponding to the text information is obtained; based on the text encoding, the image encoding obtained by the diffusion model in each round of iterative reasoning process is coded Intervene to obtain the basic image coding of each round of reasoning; based on the feature control image, obtain the feature control image coding corresponding to the feature control image; based on the feature control image coding, the target round of reasoning is Basic image coding intervenes until the end of the inference process to obtain the painting image to be generated.

又一方面，本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现以执行上述各方法提供的绘画图像生成方法，该方法包括：获取文本信息以及特征控制图像，其中，所述文本信息为描述待生成绘画图像的图像画面的文本，所述待生成绘画图像具有所述特征控制图像的图像特征；基于所述文本信息，得到与所述文本信息对应的文本编码；基于所述文本编码，对扩散模型在多轮次迭代推理过程中各轮次推理得到的图像编码进行干预，得到各轮次推理的基础图像编码；基于所述特征控制图像，得到与所述特征控制图像对应的特征控制图像编码；基于所述特征控制图像编码，对目标轮次推理的所述基础图像编码进行干预直至推理过程结束，以得到所述待生成绘画图像。In another aspect, the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored. The computer program is implemented when executed by a processor to execute the painting image generation method provided by each of the above methods. The method includes : Obtain text information and a feature control image, wherein the text information is text describing the image frame of the painting image to be generated, and the painting image to be generated has the image features of the feature control image; based on the text information, obtain Text coding corresponding to the text information; based on the text coding, intervene in the image coding obtained by the diffusion model in each round of inference in the multi-round iterative reasoning process to obtain the basic image coding of each round of reasoning; based on the The feature control image is used to obtain the feature control image coding corresponding to the feature control image; based on the feature control image coding, intervention is performed on the basic image coding of the target round reasoning until the end of the reasoning process to obtain the to-be-determined Generate painting images.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行各个实施例或者实施例的某些部分所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the part of the above technical solution that essentially contributes to the existing technology can be embodied in the form of a software product. The computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disc, optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments.

进一步可以理解的是，本发明实施例中尽管在附图中以特定的顺序描述操作，但是不应将其理解为要求按照所示的特定顺序或是串行顺序来执行这些操作，或是要求执行全部所示的操作以得到期望的结果。在特定环境中，多任务和并行处理可能是有利的。It should be further understood that although the operations are described in a specific order in the drawings in the embodiments of the present invention, this should not be understood as requiring that these operations be performed in the specific order shown or in a serial order, or that it is required that the operations be performed in a specific order. Perform all operations shown to obtain the desired results. In certain circumstances, multitasking and parallel processing may be advantageous.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be used Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A painting image generation method, characterized in that the method includes:

Obtain text information and a feature control image, wherein the text information is text describing the image frame of the painting image to be generated, and the painting image to be generated has the image features of the feature control image;

Based on the text information, obtain a text encoding corresponding to the text information;

Based on the text coding, intervene in the image coding obtained by each round of reasoning in the multi-round iterative reasoning process of the diffusion model to obtain the basic image coding of each round of reasoning;

Based on the characteristic control image, obtain a characteristic control image code corresponding to the characteristic control image;

The image encoding is controlled based on the characteristics, and the basic image encoding of the target round inference is intervened until the end of the inference process to obtain the painting image to be generated.

2. The painting image generation method according to claim 1, characterized in that the control of image coding based on the characteristics intervenes in the basic image coding of target round reasoning until the end of the reasoning process, specifically including:

When the target round of reasoning is before the reference round of reasoning, the image coding is controlled based on the characteristics, and the basic image coding is fully replaced and intervened according to the first preset replacement rate until the end of the reasoning process, where , the first preset replacement rate is greater than or equal to the replacement rate threshold; the Gaussian noise of the basic image encoding obtained before the reference round of inference is greater than or equal to the noise threshold.

3. The painting image generation method according to claim 1, characterized in that the control image encoding based on the characteristics intervenes in the basic image encoding of the target round inference until the end of the inference process, specifically including:

In the case where the target round of reasoning is after the reference round of reasoning, the image coding is controlled based on the characteristics, and the basic image coding corresponding to the activation area is locally replaced and intervened according to a second preset replacement rate until The reasoning process ends, wherein the second preset replacement rate is less than the replacement rate threshold; the activation area is an intervention area that requires intervention in the painting image to be generated determined according to the image characteristics of the feature control image; in The Gaussian noise of the base image encoding obtained after the reference round of inference is less than the noise threshold.

4. The painting image generation method according to any one of claims 1 to 3, characterized in that the characteristic control images include two or more, and the corresponding characteristic control image codes include two or more more than

The method of controlling image coding based on the characteristics and intervening in the basic image coding of target round reasoning until the end of the reasoning process specifically includes:

Determine the replacement weight value of each of the feature control image encodings respectively;

According to the replacement weight value, the basic image coding of the target round of reasoning is replaced and intervened until the end of the reasoning process based on the feature control image coding.

5. The painting image generation method according to claim 4, characterized in that the control image coding based on the characteristics replaces and intervenes in the basic image coding of target round reasoning until the end of the reasoning process, specifically including:

Perform multiple rounds of iterative noise on the feature control image coding to the round corresponding to the target round of reasoning to obtain the noise-added feature control image coding;

The image encoding is controlled based on the noise-added features, and the basic image encoding of the target round inference is replaced and intervened until the end of the inference process.

6. The painting image generation method according to claim 1, characterized in that, when the diffusion model is a diffusion model from depth information to an image, the image encoding is controlled based on the characteristics, and the target round is The basic image encoding of reasoning intervenes until the end of the reasoning process, specifically including:

The image encoding is controlled based on the characteristics, and the target channel of the basic image encoding of the target round inference is intervened until the end of the inference process, wherein the target channel is a channel other than the channel corresponding to the depth information, and the depth The information is image depth information corresponding to the basic image encoding.

7. The painting image generation method according to claim 1, characterized in that the feature control image includes a feature control image with preset depth image features, a feature control image with preset edge structure image features, and a feature control image with preset edge structure image features. The characteristics of pose image features control one or more types in the image.

8. A painting image generating device, characterized in that the device includes:

An acquisition module, configured to acquire text information and a feature control image, where the text information is text describing the image frame of the painting image to be generated, and the painting image to be generated has the image features of the feature control image;

A text encoding module, configured to obtain a text encoding corresponding to the text information based on the text information;

An intervention module, configured to intervene in the image coding obtained by each round of reasoning in the multi-round iterative reasoning process of the diffusion model based on the text coding, and obtain the basic image coding of each round of reasoning;

An image coding module, configured to obtain a feature control image coding corresponding to the feature control image based on the feature control image;

A generation module, configured to control image coding based on the characteristics, and intervene in the basic image coding of the target round of reasoning until the end of the reasoning process to obtain the painting image to be generated.

9. An electronic device, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that when the processor executes the program, it implements claim 1 The painting image generation method described in any one of to 7.

10. A non-transitory computer-readable storage medium with a computer program stored thereon, characterized in that when the computer program is executed by a processor, the painting image generation method as claimed in any one of claims 1 to 7 is implemented. .