CN116721186A - Drawing image generation method and device, electronic equipment and storage medium - Google Patents

Drawing image generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116721186A
CN116721186A CN202311003292.XA CN202311003292A CN116721186A CN 116721186 A CN116721186 A CN 116721186A CN 202311003292 A CN202311003292 A CN 202311003292A CN 116721186 A CN116721186 A CN 116721186A
Authority
CN
China
Prior art keywords
image
reasoning
round
control image
feature control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311003292.XA
Other languages
Chinese (zh)
Other versions
CN116721186B (en
Inventor
陈煌榕
门征
徐元春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hongmian Xiaoice Technology Co Ltd
Original Assignee
Beijing Hongmian Xiaoice Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Hongmian Xiaoice Technology Co Ltd filed Critical Beijing Hongmian Xiaoice Technology Co Ltd
Priority to CN202311003292.XA priority Critical patent/CN116721186B/en
Publication of CN116721186A publication Critical patent/CN116721186A/en
Application granted granted Critical
Publication of CN116721186B publication Critical patent/CN116721186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a drawing image generation method, a device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence, wherein the method comprises the following steps: acquiring text information and a feature control image, wherein the text information is a text describing an image picture of a drawing image to be generated, and the drawing image to be generated has image features of the feature control image; based on the text information, obtaining a text code corresponding to the text information; based on text codes, intervening the image codes obtained by each round of reasoning in the diffusion model in the process of multiple rounds of iterative reasoning to obtain basic image codes of each round of reasoning; obtaining a feature control image code corresponding to the feature control image based on the feature control image; and based on the characteristic control image coding, intervening the basic image coding of the target round reasoning until the reasoning process is finished, so as to obtain the drawing image to be generated. The method reduces the running load of the video memory and the graphic processor and reduces the generation cost of the drawing image.

Description

Drawing image generation method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to a method and apparatus for generating a drawing image, an electronic device, and a storage medium.
Background
It is known in the art that Diffusion models (also known as Diffusion models) are used as a main tool for generating pictorial images. Compared with other generated models, the Diffusion model has the characteristic of multiple iterations and gradual denoising, so that the generated drawing image has a higher level.
However, in terms of content control of the generated pictorial image, it is necessary to integrate into the Diffusion model by means of an auxiliary neural network model. Because other auxiliary neural network models are required to be introduced in the process of generating the painting image, the running load of a video memory and a graphic processor is increased, and the generating cost of the painting image is increased.
Disclosure of Invention
The invention provides a drawing image generation method, a device, electronic equipment and a storage medium, which can automatically obtain a drawing image to be generated with image characteristics of a characteristic control image under the condition that other auxiliary neural network models are not required to be additionally introduced, so that the running load of a video memory and a graphic processor can be reduced, and the generation cost of the drawing image is reduced.
The invention provides a drawing image generation method, which comprises the following steps: acquiring text information and a feature control image, wherein the text information is text of an image picture describing a drawing image to be generated, and the drawing image to be generated has image features of the feature control image; based on the text information, obtaining a text code corresponding to the text information; based on the text codes, intervening the image codes obtained by each round of reasoning in the diffusion model in the multi-round iterative reasoning process to obtain basic image codes of each round of reasoning; obtaining a feature control image code corresponding to the feature control image based on the feature control image; and based on the characteristic control image codes, intervening the basic image codes of target round reasoning until the reasoning process is finished, so as to obtain the drawing image to be generated.
According to the drawing image generation method provided by the invention, the image coding is controlled based on the characteristics, and the basic image coding of target round reasoning is interfered until the reasoning process is finished, and the method specifically comprises the following steps: under the condition that the target round reasoning is before the reference round reasoning, controlling image coding based on the characteristics, comprehensively replacing and intervening the basic image coding according to a first preset replacement rate until the reasoning process is finished, wherein the first preset replacement rate is larger than or equal to a replacement rate threshold value; the gaussian noise of the base image code obtained before the reference round reasoning is greater than or equal to a noise threshold.
According to the drawing image generation method provided by the invention, the image coding is controlled based on the characteristics, and the basic image coding of target round reasoning is interfered until the reasoning process is finished, and the method specifically comprises the following steps: under the condition that the target round reasoning is behind the reference round reasoning, controlling image coding based on the characteristics, and carrying out local replacement and intervention on the basic image coding corresponding to the activation region according to a second preset replacement rate until the reasoning process is finished, wherein the second preset replacement rate is smaller than a replacement rate threshold value; the activation area is an intervention area which is determined according to the image characteristics of the characteristic control image and needs to be intervened in the drawing image to be generated; the gaussian noise of the base image code obtained after the reference round reasoning is less than a noise threshold.
According to the drawing image generation method provided by the invention, the characteristic control image comprises two or more than two, and the corresponding characteristic control image codes comprise two or more than two; the image coding is controlled based on the characteristics, the basic image coding of target round reasoning is interfered until the reasoning process is finished, and the method specifically comprises the following steps: respectively determining a replacement weight value of each characteristic control image code; and according to the replacement weight value, the basic image codes of the target round reasoning are replaced and intervened based on the characteristic control image codes until the reasoning process is finished.
According to the drawing image generation method provided by the invention, the basic image code of target round reasoning is replaced and intervened based on the characteristic control image code until the reasoning process is finished, and the method concretely comprises the following steps: performing repeated iteration noise adding on the characteristic control image code to the turn corresponding to the target turn reasoning to obtain a denoised characteristic control image code; and based on the denoised characteristic control image codes, replacing and intervening the basic image codes of the target round reasoning until the reasoning process is finished.
According to the method for generating the painting image provided by the invention, when the diffusion model is a diffusion model from depth information to an image, the image coding is controlled based on the characteristics, and the basic image coding of target round reasoning is intervened until the reasoning process is finished, the method specifically comprises the following steps: and based on the characteristic control image coding, performing intervention on a target channel of the basic image coding of target round reasoning until the reasoning process is finished, wherein the target channel is other channels except a channel corresponding to depth information, and the depth information is image depth information corresponding to the basic image coding.
According to the drawing image generation method provided by the invention, the characteristic control image comprises one or more of a characteristic control image with preset depth image characteristics, a characteristic control image with preset edge structure image characteristics and a characteristic control image with preset pose image characteristics.
The invention also provides a drawing image generating device, which comprises: the device comprises an acquisition module, a feature control module and a feature control module, wherein the acquisition module is used for acquiring text information and a feature control image, the text information is text describing an image picture of a drawing image to be generated, and the drawing image to be generated has the image features of the feature control image; the text coding module is used for obtaining a text code corresponding to the text information based on the text information; the intervention module is used for intervening the image codes obtained by each round of reasoning in the multi-round iterative reasoning process of the diffusion model based on the text codes to obtain basic image codes of each round of reasoning; the image coding module is used for obtaining a characteristic control image code corresponding to the characteristic control image based on the characteristic control image; and the generation module is used for controlling the image coding based on the characteristics, and intervening the basic image coding of the target round reasoning until the reasoning process is finished so as to obtain the drawing image to be generated.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the drawing image generation method as described in any one of the above when executing the program.
The present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a pictorial image generation method as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements a pictorial image generation method as described in any of the above.
The invention provides a drawing image generation method, a device, electronic equipment and a storage medium, which are used for acquiring text information and a characteristic control image, interfering image codes obtained by each round of reasoning in a diffusion model in a multi-round iterative reasoning process based on the text codes corresponding to the text information to obtain basic image codes of each round of reasoning, interfering basic image codes of target round reasoning until the reasoning process is finished based on the characteristic control image codes corresponding to the characteristic control image, so that a drawing image to be generated with image characteristics of the characteristic control image can be automatically obtained without additionally introducing other auxiliary neural network models, further the running load of a display memory and a graphic processor can be reduced, and the generation cost of the drawing image is reduced.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a pictorial image generation method provided by the invention;
FIG. 2 is a schematic flow chart of the invention for intervening in the basic image coding of the target round reasoning based on the feature control image coding until the reasoning process is finished;
FIG. 3 is a schematic flow chart of the feature-based control image coding provided by the invention, wherein the basic image coding of the target round reasoning is replaced and intervened until the reasoning process is finished;
fig. 4 is a schematic view of an application scenario of the painting image generating method provided by the present invention;
fig. 5 is a schematic structural view of a pictorial image generating apparatus provided by the present invention;
fig. 6 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a schematic flow chart of a drawing image generating method provided by the invention.
In order to further describe the drawing image generation method provided by the present invention, the following description will be made with reference to the following embodiments.
In an exemplary embodiment of the present invention, as can be seen in fig. 1, the drawing image generating method may include steps 110 to 150, and each step will be described below.
In step 110, text information and a feature control image are acquired.
The text information is a text describing an image picture of the painting image to be generated; the painting image to be generated has image features of the feature control image.
It will be appreciated that the text information is text information describing the overall portrayal of the pictorial image to be generated, in other words, the pictorial image to be generated may be a representation in which the text information is presented in an image manner.
The drawing image to be generated has corresponding image features, such as image structural features, image color features, pose features of each element in the image, or depth information features of the image, besides the picture image described by the text information. In the application process, the image features of the painting image can be interfered according to the image features of the pre-provided feature control image. It will be appreciated that the image in which the intervention is based on the feature control image may also have image features that the feature control image has.
In still another exemplary embodiment of the present invention, the feature control image may include one or more of a feature control image having a preset depth image feature, a feature control image having a preset edge structure image feature, and a feature control image having a preset pose image feature.
In the present embodiment, the specific content of the feature control image is not limited, and it may be determined according to the actual demand of the user.
In step 120, a text code corresponding to the text information is obtained based on the text information.
In one embodiment, the textual information may be code-characterized by a language model, such as a clip model, a T5 model, or the like, to obtain a text code corresponding to the textual information. It will be appreciated that text encoding will be a factor in the course of the subsequent multiple iterative reasoning based on the diffusion model.
In step 130, based on the text codes, the image codes obtained by each round of reasoning in the multi-round iterative reasoning process of the diffusion model are interfered to obtain the basic image codes of each round of reasoning.
It should be noted that, in the process of performing multiple iterative reasoning on the Diffusion model (Diffusion model), the image code is obtained by each round of reasoning, and it can be understood that the image code is obtained by iterating the Diffusion model by noise. In the application process, the image codes can be interfered based on the text codes, so that the basic image codes of each round of reasoning can be obtained. Since the base image codes are derived based on text coding interventions, each of the resulting base image codes can be considered as an intermediate state in deriving the drawing image to be generated based on diffusion model reasoning.
In yet another embodiment, the cross-section model may be used to fuse the text code with the image code to implement the intervention of the text code to the image code, and thus may obtain the base image code.
In step 140, a feature control image code corresponding to the feature control image is obtained based on the feature control image.
In step 150, based on the feature control image coding, the basic image coding of the target round reasoning is interfered until the reasoning process is finished, so as to obtain the drawing image to be generated.
In one embodiment, the feature control image may be extracted to obtain a feature control image code corresponding to the feature control image. Furthermore, the feature control image coding can be combined, the basic image coding obtained in each round of reasoning process can be subjected to intervention processing, and iterative reasoning is performed until the reasoning process is finished.
In yet another example, image coding may also be controlled based on features, and the underlying image coding derived from the targeted round inference begins an intervention process, and the iterative inference is intervened until the inference process ends. The target round reasoning can be determined according to actual situations, in an example, an iterative reasoning process of the diffration model is taken as an example for describing 50 rounds, and the process of the target round reasoning can be any one of the 50 rounds of reasoning. When the target round is the 1 st round reasoning process, intervention is performed from the beginning of the basic image coding of the first round reasoning based on the characteristic control image coding, and iterative reasoning is performed until the 50 rounds reasoning process is finished, so that the drawing image to be generated with the image characteristics of the characteristic control image can be obtained. According to the embodiment, the image to be generated of the image characteristic of the characteristic control image can be automatically obtained without additionally introducing other auxiliary neural network models, so that the running load of a video memory and a graphics processor can be reduced, and the generation cost of the image is reduced.
It should be noted that, in the current process of obtaining the drawing image to be generated with the image features of the feature control image by introducing the auxiliary neural network model, additional training of the independent auxiliary neural network model is also required to be time-consuming and labor-consuming; in addition, in the process of inserting the auxiliary module plugin corresponding to the auxiliary neural network model, the auxiliary module plugin is not matched with the Diffusion model. However, according to the drawing image generation method provided by the invention, the intermediate state obtained in the iterative reasoning process of the Diffusion model, for example, the basic image coding is interfered until the reasoning process is finished, so that the drawing image to be generated with the image characteristics of the characteristic control image can be automatically obtained without additionally introducing other auxiliary neural network models, the problem that the additional training of an independent auxiliary neural network model consumes time and labor is avoided, and the problem that the auxiliary module plug-in corresponding to the auxiliary neural network model and the Diffusion model are not adaptive is solved, and the auxiliary module plug-in and the Diffusion model can be decoupled. The auxiliary module is understood to be a processing module which is able to intervene in the drawing image to be generated as an image feature with the feature control image.
The feature control image can be adjusted according to the requirement of a user, in addition, the target round can be determined according to the requirement of the user, so that the intervention process of encoding the basic image based on the feature control image, which can be personalized according to the requirement of the user, in the drawing image generation process can be automatically controlled, and the drawing image to be generated, which meets the requirement of the user, can be obtained.
According to the drawing image generation method, the text information and the characteristic control image are acquired, the image codes obtained by each round of reasoning in the diffusion model in the multi-round iterative reasoning process are interfered based on the text codes corresponding to the text information, so that the basic image codes of each round of reasoning are obtained, and then the basic image codes of the target round reasoning are interfered until the reasoning process is finished based on the characteristic control image codes corresponding to the characteristic control image, so that the drawing image to be generated with the image characteristics of the characteristic control image can be automatically obtained under the condition that other auxiliary neural network models are not required to be additionally introduced, further the running load of a display memory and a graphic processor can be reduced, and the generation cost of the drawing image is reduced.
FIG. 2 is a schematic flow chart of the feature-based control image coding provided by the invention, wherein the basic image coding of the target round reasoning is interfered until the reasoning process is finished.
To further describe the painting image generation method provided by the present invention, a description will be given below with reference to fig. 2.
In an exemplary embodiment of the present invention, as can be seen in connection with fig. 2, the intervention of the basic image coding of the target round inference based on the feature control image coding may comprise a step 210 and a step 220 until the inference process is finished, and each step will be described separately.
In step 210, in the case that the target round inference precedes the reference round inference, the image encoding is controlled based on the features, and the basic image encoding is comprehensively replaced and interfered according to the first preset replacement rate until the inference process is finished.
In step 220, in the case that the target round inference follows the reference round inference, the image encoding is controlled based on the features, and the basic image encoding corresponding to the activation region is partially replaced and interfered according to the second preset replacement rate until the inference process is finished.
Wherein the first preset replacement rate is greater than or equal to a replacement rate threshold; the second preset replacement rate is smaller than the replacement rate threshold value; the activation area is an intervention area which is determined according to the image characteristics of the characteristic control image and needs to be intervened in the drawing image to be generated; the Gaussian noise of the basic image code obtained before the reference round reasoning is larger than or equal to a noise threshold; the gaussian noise of the base image code obtained after the reference round reasoning is smaller than the noise threshold.
In one embodiment, the DDIM algorithm may be employed as an optimization method that speeds up reasoning and reduces iteration rounds. The explanation will be continued taking the iterative reasoning process of the diffration model as an example with 50 rounds. In the iterative reasoning of 50 rounds, gaussian noise in the iterative reasoning process of the first 25 rounds is obvious, and actual content is hardly seen; the content is gradually clear in the iterative reasoning process of the last 25 rounds. Thus, the deep intermediate intervention is performed in the first 25 rounds, each round being set to a global substitution rate of 0.7 in weight. And performing shallow intervention for the next 25 rounds, only updating the replacement of the activation area, and setting the replacement rate to be 0.4. Therefore, based on the drawing image generation method provided by the invention, for the intervention of a new auxiliary module, an independent auxiliary network does not need to be additionally trained for control generation.
In yet another embodiment, the reference run may be understood as the 25 th run described previously. The Gaussian noise of the basic image code obtained before the reference round reasoning is larger than or equal to a noise threshold, so that in the process of intervening the basic image code of the target round reasoning, the basic image code can be comprehensively replaced and intervening according to a first preset replacement rate until the iterative reasoning process is finished under the condition that the target round reasoning is before the reference round reasoning. The first preset substitution rate may correspond to 0.7 as described above. In the present embodiment, the first preset replacement rate is not particularly limited. It should be noted that, in the case where the total iteration number is 50, the first preset replacement rate may be set to be 0.5 or more, and the second preset replacement rate may be set to be 0.5 or less.
In yet another embodiment, the gaussian noise of the base image code obtained after the reference round reasoning is smaller than the noise threshold, so in the case that the target round reasoning is after the reference round reasoning, in the process of intervening the base image code of the target round reasoning, the base image code corresponding to the activation region may be partially replaced and intervening according to the second preset replacement rate until the iterative reasoning process ends. Wherein the second preset substitution rate may correspond to 0.4 as described above. In the present embodiment, the second preset replacement rate is not particularly limited.
The activation area is understood as an area corresponding to the background of the drawing image to be generated immediately if it is determined that the background of the drawing image to be generated needs to be interfered according to the image features of the feature control image.
FIG. 3 is a schematic flow chart of the feature-based control image coding provided by the invention, wherein the basic image coding of the target round reasoning is replaced and intervened until the reasoning process is finished.
The following will explain the embodiment shown in fig. 3.
In an exemplary embodiment of the present invention, the feature control image may include two or more, and the corresponding feature control image code also includes two or more. As can be seen in connection with fig. 3, the intervention of the basic image coding of the target round inference based on the feature control image coding may comprise steps 310 and 320, respectively, to be described below, until the inference process is finished.
In step 310, a replacement weight value for each feature control image code is determined separately;
in step 320, the base image code of the target round inference is replaced and intervened based on the feature control image code according to the replacement weight values until the inference process is completed.
In one embodiment, in the case that the feature control image code includes two or more than two feature control image codes, if the direct addition and replacement of the corresponding positions of the corresponding images of the base image code may result in an excessively high value, in the application process, the replacement weight value of each feature control image code may be determined respectively; and according to the replacement weight value, the basic image codes of the target round reasoning are replaced and intervened based on the characteristic control image codes until the reasoning process is finished.
In yet another embodiment, the base image code of the target round inference may also be replaced and intervened by the replacement weight value based on the feature control image code until the inference process ends, where the base image code may be understood as a base image code corresponding to the activation region.
In yet another exemplary embodiment of the present invention, based on the feature control image coding, replacing and intervening the basic image coding of the target round inference until the inference process is finished may also be implemented in the following manner:
Performing multiple iteration noise adding on the feature control image code to the turn corresponding to the target turn reasoning to obtain a denoised feature control image code;
and (3) controlling image coding based on the denoised features, replacing and intervening the basic image coding of the target round reasoning until the reasoning process is finished.
In one embodiment, the feature control image code corresponding to the feature control image may be subjected to multiple iterations of denoising to the corresponding turn of the target turn inference, which may be understood as a reverse inference process, so as to obtain a denoised candidate feature control image code. Furthermore, the image coding can be controlled based on the characteristics after noise addition, and the basic image coding of the target round reasoning can be replaced and intervened until the iterative reasoning process is finished. In this embodiment, the feature control image code is subjected to noise adding processing and then is replaced and interfered, so that the correspondence of the feature control image code and the basic image code in the reasoning turn in the iterative reasoning process can be ensured, the feature control image code and the basic image code are in turn correspondence in the intervention process of each turn after the target turn reasoning, and further, the replacement and the intervention of the basic image code can be better realized.
Fig. 4 is a schematic view of an application scenario of the painting image generating method provided by the invention.
The process of replacing and intervening with the basic image code of the target round inference until the inference process is completed is further described with reference to the foregoing feature-based control image code, as will be described below with reference to fig. 4.
In one embodiment, a diagram is combined4, from X T To X 0 It can be understood as 50 reasoning rounds in the iterative reasoning process based on the Diffusion model. Wherein X is T And coding the image corresponding to random noise in the initial reasoning of the Diffusion model. X is X 0 It is understood that the feature control image code corresponds to the feature control image. When the target round reasoning is t rounds corresponding to fig. 4, the feature control image code can be subjected to multiple rounds of iterative denoising to the rounds corresponding to the target round reasoning (corresponding to t rounds) to obtain a denoised feature control image code, wherein the denoised feature control image code can correspond to q (x) in fig. 4 t | x t-1 ) The base image encoding may correspond to p (x t-1 | x t ). In the application process, when the feature control image coding is interfered by the t-turn, the feature control image coding is processed to the t-turn in a reverse noise adding mode. Further, in the T-to-0 round, that is, the T-to-T-1 round of each intermediate state process, different replacement weight values can be set for different feature control image codes, and intervention replacement can be performed according to the replacement weight values.
In an example, in the case where the feature control image code is a feature control image code corresponding to canny and a feature control image code corresponding to ose, the replacement weight value of the feature control image code corresponding to canny may be set to 0.6 and the replacement weight value of the feature control image code corresponding to ose may be set to 0.4.
In still another exemplary embodiment of the present invention, in the case where the diffusion model is a diffusion model from depth information to an Image (corresponding to the manner of generating depth to Image), continuing to describe the embodiment illustrated in fig. 1, the intervention of the basic Image coding of the target round inference based on the feature control Image coding until the inference process ends (corresponding to step 150) may be implemented in the following manner:
and (3) based on the feature control image coding, intervening a target channel of the basic image coding of target round reasoning until the reasoning process is finished, wherein the target channel is other channels except a channel corresponding to the depth information, and the depth information is the image depth information corresponding to the basic image coding.
In one embodiment, under the condition of the depth to Image generation mode, under the unet frame, intervention reasoning processing can be performed on the first four channels only, and the depth information of the fifth channel is unchanged, so that the effect of double-sided control can be achieved. Wherein the first four channels may correspond to the target channels described above; the fifth channel may correspond to the depth information corresponding channel described above.
In order to control the generation cost of generating the drawing image to be generated, the invention provides a control method for intervening an intermediate state (corresponding to basic image coding) insertion auxiliary module in sampling iteration (corresponding iteration reasoning) by using a Diffusion model, which has the main effects that:
1. no additional training network: the method avoids the increase of the video memory in the reasoning stage without training an additional network, solves the problem that the video memory needs to be increased in the multi-module expansion, and saves the deployment cost;
2. support new auxiliary module extensions: the auxiliary module is inserted in the sampling iteration to intervene, so that independent network training on the auxiliary module is avoided, and the expansion of a new auxiliary module is facilitated;
3. support multi-auxiliary module combination control: controlling intervention of different auxiliary modules by using different weights, and intervening image generation in corresponding activation areas;
4. the auxiliary module is decoupled from the diffration base model. The auxiliary module is beneficial to being applied to different Diffusion basic models, and meanwhile, a convenient mode is further provided for expansion of various sags by combining the intermediate state interference control method without training and various basic models.
According to the description, the invention provides a drawing image generation method, text information and a characteristic control image are acquired, image codes obtained by each round of reasoning in a diffusion model in a multi-round iteration reasoning process are interfered based on the text codes corresponding to the text information, so that basic image codes of each round of reasoning are obtained, and then the basic image codes of a target round of reasoning are interfered until the reasoning process is finished based on the characteristic control image codes corresponding to the characteristic control image, so that a drawing image to be generated with image characteristics of the characteristic control image can be automatically obtained without additionally introducing other auxiliary neural network models, further the running load of a display memory and a graphic processor can be reduced, and the generation cost of the drawing image is reduced.
Based on the same conception, the invention also provides a drawing image generating device.
The painting-image generating apparatus provided by the present invention will be described below, and the painting-image generating apparatus described below and the painting-image generating method described above may be referred to correspondingly to each other.
Fig. 5 is a schematic structural view of a pictorial image generating apparatus provided by the present invention.
In an exemplary embodiment of the present invention, as can be seen in conjunction with fig. 5, the pictorial image generating apparatus may include an acquisition module 510, a text encoding module 520, an intervention module 530, an image encoding module 540, and a generation module 550, each of which will be described separately below.
The obtaining module 510 may be configured to obtain text information and a feature control image, where the text information is text describing an image frame of a drawing image to be generated, and the drawing image to be generated has image features of the feature control image;
the text encoding module 520 may be configured to obtain a text encoding corresponding to the text information based on the text information;
the intervention module 530 may be configured to, based on the text encoding, intervene on the image encoding obtained by each round of reasoning in the multiple rounds of iterative reasoning process of the diffusion model, to obtain a base image encoding of each round of reasoning;
the image encoding module 540 may be configured to obtain a feature control image encoding corresponding to the feature control image based on the feature control image;
the generating module 550 may be configured to control the image encoding based on the features, and intervene on the basic image encoding of the target round inference until the inference process ends, to obtain the drawing image to be generated.
In an exemplary embodiment of the present invention, the generating module 550 may implement feature-based control image coding, and intervene on the basic image coding of the target round inference until the inference process ends in the following manner:
under the condition that the target round reasoning is before the reference round reasoning, based on the characteristic control image coding, comprehensively replacing and intervening the basic image coding according to a first preset replacement rate until the reasoning process is finished, wherein the first preset replacement rate is larger than or equal to a replacement rate threshold value; the gaussian noise of the base image code obtained before the reference round reasoning is greater than or equal to the noise threshold.
In an exemplary embodiment of the present invention, the generating module 550 may implement feature-based control image coding, and intervene on the basic image coding of the target round inference until the inference process ends in the following manner:
under the condition that the target round reasoning is carried out after the reference round reasoning, based on the characteristic control image coding, carrying out local replacement and intervention on the basic image coding corresponding to the activation region according to a second preset replacement rate until the reasoning process is finished, wherein the second preset replacement rate is smaller than a replacement rate threshold value; the activation area is an intervention area which is determined according to the image characteristics of the characteristic control image and needs to be intervened in the drawing image to be generated; the gaussian noise of the base image code obtained after the reference round reasoning is smaller than the noise threshold.
In an exemplary embodiment of the present invention, the feature control image may include two or more, and the corresponding feature control image code may include two or more;
the generation module 550 may implement feature-based control image coding in the following manner, and intervene on the basic image coding of the target round inference until the inference process ends:
respectively determining a replacement weight value of each feature control image code;
and replacing and intervening the basic image codes of the target round reasoning based on the characteristic control image codes according to the replacement weight values until the reasoning process is finished.
In an exemplary embodiment of the present invention, the generation module 550 may implement the replacement and intervention of the base image code of the target round inference based on the feature control image code until the inference process ends in the following manner:
performing multiple iteration noise adding on the feature control image code to the turn corresponding to the target turn reasoning to obtain a denoised feature control image code;
and (3) controlling image coding based on the denoised features, replacing and intervening the basic image coding of the target round reasoning until the reasoning process is finished.
In an exemplary embodiment of the present invention, in the case where the diffusion model is a diffusion model from depth information to an image, the generating module 550 may implement feature-based control image encoding, and intervene on the basic image encoding of the target round inference until the inference process ends in the following manner:
And (3) based on the feature control image coding, intervening a target channel of the basic image coding of target round reasoning until the reasoning process is finished, wherein the target channel is other channels except a channel corresponding to the depth information, and the depth information is the image depth information corresponding to the basic image coding.
In an exemplary embodiment of the present invention, the feature control image may include one or more of a feature control image having a preset depth image feature, a feature control image having a preset edge structure image feature, and a feature control image having a preset pose image feature.
Fig. 6 illustrates a physical schematic diagram of an electronic device, as shown in fig. 6, which may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a pictorial image generation method comprising: acquiring text information and a feature control image, wherein the text information is text of an image picture describing a drawing image to be generated, and the drawing image to be generated has image features of the feature control image; based on the text information, obtaining a text code corresponding to the text information; based on the text codes, intervening the image codes obtained by each round of reasoning in the diffusion model in the multi-round iterative reasoning process to obtain basic image codes of each round of reasoning; obtaining a feature control image code corresponding to the feature control image based on the feature control image; and based on the characteristic control image codes, intervening the basic image codes of target round reasoning until the reasoning process is finished, so as to obtain the drawing image to be generated.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing the drawing image generation method provided by the methods described above, the method comprising: acquiring text information and a feature control image, wherein the text information is text of an image picture describing a drawing image to be generated, and the drawing image to be generated has image features of the feature control image; based on the text information, obtaining a text code corresponding to the text information; based on the text codes, intervening the image codes obtained by each round of reasoning in the diffusion model in the multi-round iterative reasoning process to obtain basic image codes of each round of reasoning; obtaining a feature control image code corresponding to the feature control image based on the feature control image; and based on the characteristic control image codes, intervening the basic image codes of target round reasoning until the reasoning process is finished, so as to obtain the drawing image to be generated.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the drawing image generation method provided by the above methods, the method comprising: acquiring text information and a feature control image, wherein the text information is text of an image picture describing a drawing image to be generated, and the drawing image to be generated has image features of the feature control image; based on the text information, obtaining a text code corresponding to the text information; based on the text codes, intervening the image codes obtained by each round of reasoning in the diffusion model in the multi-round iterative reasoning process to obtain basic image codes of each round of reasoning; obtaining a feature control image code corresponding to the feature control image based on the feature control image; and based on the characteristic control image codes, intervening the basic image codes of target round reasoning until the reasoning process is finished, so as to obtain the drawing image to be generated.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
It will further be appreciated that although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of generating a pictorial image, the method comprising:
acquiring text information and a feature control image, wherein the text information is text of an image picture describing a drawing image to be generated, and the drawing image to be generated has image features of the feature control image;
based on the text information, obtaining a text code corresponding to the text information;
based on the text codes, intervening the image codes obtained by each round of reasoning in the diffusion model in the multi-round iterative reasoning process to obtain basic image codes of each round of reasoning;
obtaining a feature control image code corresponding to the feature control image based on the feature control image;
and based on the characteristic control image codes, intervening the basic image codes of target round reasoning until the reasoning process is finished, so as to obtain the drawing image to be generated.
2. The method for generating a pictorial image according to claim 1, wherein said controlling image coding based on said characteristics intervenes on said basic image coding of the target round inference until the inference process is finished, specifically comprising:
under the condition that the target round reasoning is before the reference round reasoning, controlling image coding based on the characteristics, comprehensively replacing and intervening the basic image coding according to a first preset replacement rate until the reasoning process is finished, wherein the first preset replacement rate is larger than or equal to a replacement rate threshold value; the gaussian noise of the base image code obtained before the reference round reasoning is greater than or equal to a noise threshold.
3. The method for generating a pictorial image according to claim 1, wherein said controlling image coding based on said characteristics intervenes on said basic image coding of the target round inference until the inference process is finished, specifically comprising:
under the condition that the target round reasoning is behind the reference round reasoning, controlling image coding based on the characteristics, and carrying out local replacement and intervention on the basic image coding corresponding to the activation region according to a second preset replacement rate until the reasoning process is finished, wherein the second preset replacement rate is smaller than a replacement rate threshold value; the activation area is an intervention area which is determined according to the image characteristics of the characteristic control image and needs to be intervened in the drawing image to be generated; the gaussian noise of the base image code obtained after the reference round reasoning is less than a noise threshold.
4. A pictorial image generation method according to any of claims 1 to 3 wherein the feature control image comprises two or more and the corresponding feature control image code comprises two or more;
the image coding is controlled based on the characteristics, the basic image coding of target round reasoning is interfered until the reasoning process is finished, and the method specifically comprises the following steps:
Respectively determining a replacement weight value of each characteristic control image code;
and according to the replacement weight value, the basic image codes of the target round reasoning are replaced and intervened based on the characteristic control image codes until the reasoning process is finished.
5. The method for generating a pictorial image as in claim 4, wherein said replacing and intervening said base image code of the target round inference based on said feature control image code until the inference process is finished, specifically comprises:
performing repeated iteration noise adding on the characteristic control image code to the turn corresponding to the target turn reasoning to obtain a denoised characteristic control image code;
and based on the denoised characteristic control image codes, replacing and intervening the basic image codes of the target round reasoning until the reasoning process is finished.
6. The method according to claim 1, wherein, in the case where the diffusion model is a diffusion model from depth information to an image, the image encoding is controlled based on the features, and the basic image encoding of target round reasoning is intervened until the reasoning process is finished, specifically comprising:
And based on the characteristic control image coding, performing intervention on a target channel of the basic image coding of target round reasoning until the reasoning process is finished, wherein the target channel is other channels except a channel corresponding to depth information, and the depth information is image depth information corresponding to the basic image coding.
7. The drawing image generation method according to claim 1, wherein the feature control image includes one or more of a feature control image having a preset depth image feature, a feature control image having a preset edge structure image feature, and a feature control image having a preset pose image feature.
8. A pictorial image generating device, the device comprising:
the device comprises an acquisition module, a feature control module and a feature control module, wherein the acquisition module is used for acquiring text information and a feature control image, the text information is text describing an image picture of a drawing image to be generated, and the drawing image to be generated has the image features of the feature control image;
the text coding module is used for obtaining a text code corresponding to the text information based on the text information;
the intervention module is used for intervening the image codes obtained by each round of reasoning in the multi-round iterative reasoning process of the diffusion model based on the text codes to obtain basic image codes of each round of reasoning;
The image coding module is used for obtaining a characteristic control image code corresponding to the characteristic control image based on the characteristic control image;
and the generation module is used for controlling the image coding based on the characteristics, and intervening the basic image coding of the target round reasoning until the reasoning process is finished so as to obtain the drawing image to be generated.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the pictorial image generation method of any of claims 1 to 7 when the program is executed by the processor.
10. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the drawing image generation method according to any one of claims 1 to 7.
CN202311003292.XA 2023-08-10 2023-08-10 Drawing image generation method and device, electronic equipment and storage medium Active CN116721186B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311003292.XA CN116721186B (en) 2023-08-10 2023-08-10 Drawing image generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311003292.XA CN116721186B (en) 2023-08-10 2023-08-10 Drawing image generation method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116721186A true CN116721186A (en) 2023-09-08
CN116721186B CN116721186B (en) 2023-12-01

Family

ID=87872045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311003292.XA Active CN116721186B (en) 2023-08-10 2023-08-10 Drawing image generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116721186B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018089343A (en) * 2016-11-28 2018-06-14 キヤノンマーケティングジャパン株式会社 Medical image processing apparatus, control method and program of medical image processing apparatus
CN115018954A (en) * 2022-08-08 2022-09-06 中国科学院自动化研究所 Image generation method and device and electronic equipment
US20230103638A1 (en) * 2021-10-06 2023-04-06 Google Llc Image-to-Image Mapping by Iterative De-Noising
CN116051668A (en) * 2022-12-30 2023-05-02 北京百度网讯科技有限公司 Training method of diffusion model of draft map and image generation method based on text
CN116188632A (en) * 2023-04-24 2023-05-30 之江实验室 Image generation method and device, storage medium and electronic equipment
CN116245749A (en) * 2022-12-27 2023-06-09 北京百度网讯科技有限公司 Image generation method and device
CN116363249A (en) * 2023-03-31 2023-06-30 北京百度网讯科技有限公司 Controllable image generation method and device and electronic equipment
CN116392812A (en) * 2022-12-02 2023-07-07 阿里巴巴(中国)有限公司 Action generating method and virtual character animation generating method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018089343A (en) * 2016-11-28 2018-06-14 キヤノンマーケティングジャパン株式会社 Medical image processing apparatus, control method and program of medical image processing apparatus
US20230103638A1 (en) * 2021-10-06 2023-04-06 Google Llc Image-to-Image Mapping by Iterative De-Noising
CN115018954A (en) * 2022-08-08 2022-09-06 中国科学院自动化研究所 Image generation method and device and electronic equipment
CN116392812A (en) * 2022-12-02 2023-07-07 阿里巴巴(中国)有限公司 Action generating method and virtual character animation generating method
CN116245749A (en) * 2022-12-27 2023-06-09 北京百度网讯科技有限公司 Image generation method and device
CN116051668A (en) * 2022-12-30 2023-05-02 北京百度网讯科技有限公司 Training method of diffusion model of draft map and image generation method based on text
CN116363249A (en) * 2023-03-31 2023-06-30 北京百度网讯科技有限公司 Controllable image generation method and device and electronic equipment
CN116188632A (en) * 2023-04-24 2023-05-30 之江实验室 Image generation method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN116721186B (en) 2023-12-01

Similar Documents

Publication Publication Date Title
CN109859147B (en) Real image denoising method based on generation of antagonistic network noise modeling
CN109272457B (en) Image mask generation method and device and server
CN115222630B (en) Image generation method, training method and training equipment for image denoising model
CN107464217B (en) Image processing method and device
CN113011337B (en) Chinese character library generation method and system based on deep meta learning
US20220366539A1 (en) Image processing method and apparatus based on machine learning
CN116630464A (en) Image style migration method and device based on stable diffusion
CN114255187A (en) Multi-level and multi-level image optimization method and system based on big data platform
DE102023125635A1 (en) IMAGE AND OBJECT INPAINTING WITH DIFFUSION MODELS
CN116957964A (en) Small sample image generation method and system based on diffusion model
CN113689348B (en) Method, system, electronic device and storage medium for restoring multi-task image
CN116721186B (en) Drawing image generation method and device, electronic equipment and storage medium
CN112906800B (en) Image group self-adaptive collaborative saliency detection method
CN117788946A (en) Image processing method, device, electronic equipment and storage medium
CN117750155A (en) Method and device for generating video based on image and electronic equipment
CN115861105A (en) Inverse halftone method and device based on conditional diffusion network
CN114943655A (en) Image restoration system for generating confrontation network structure based on cyclic depth convolution
CN111080512B (en) Cartoon image generation method and device, electronic equipment and storage medium
CN113034517A (en) Full-automatic image matting method and device based on generation countermeasure model, medium and equipment
CN111402121A (en) Image style conversion method and device, computer equipment and storage medium
CN117893766B (en) Object detection segmentation scheme
CN117993050B (en) Building design method and system based on knowledge-enhanced diffusion model
CN113936320B (en) Face image quality evaluation method, electronic device and storage medium
CN118570064A (en) Super-division method, device and storage medium for using priori knowledge in diffusion model without training
CN116437223A (en) Video de-flicker method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant