CN116580127B

CN116580127B - Image generation method, device, electronic equipment and computer readable storage medium

Info

Publication number: CN116580127B
Application number: CN202310856919.XA
Authority: CN
Inventors: 疏坤; 何山; 殷兵; 胡金水; 刘聪
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-12-01
Anticipated expiration: 2043-07-13
Also published as: CN116580127A

Abstract

The application discloses an image generation method, an image generation device, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a first generated image; responding to an image editing instruction of an area to be adjusted corresponding to a first generated image, and acquiring a reference image or a first prompt word associated with the area to be adjusted as first generated information; generating an adjustment image based on the first generation information; the adjustment image is utilized to adjust the area to be adjusted to obtain the second generated image, and the image generation efficiency can be improved through the scheme.

Description

Image generation method, device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image generating method, an image generating device, an electronic device, and a computer readable storage medium.

Background

With the rapid development of the internet and digital media, digital image processing technologies are becoming mature, and in order to efficiently draw and process digital image works with personalized features, a great deal of artificial intelligence technologies are appeared to assist human beings to complete image drawing work, and image automatic generation technologies are receiving more and more attention and research.

In the existing automatic image generation technology, the generated image often cannot meet the requirement of the user, and then multiple attempts are needed to regenerate the image to obtain the image meeting the requirement of the user, so that the image generation efficiency is low, and therefore, how to efficiently generate the image meeting the requirement of the user becomes a problem to be solved.

Disclosure of Invention

The application mainly solves the technical problem of providing an image generation method, an image generation device, electronic equipment and a computer readable storage medium, which can improve the image generation efficiency.

To solve the above technical problem, a first aspect of the present application provides an image generating method, including: acquiring a first generated image; responding to an image editing instruction of an area to be adjusted corresponding to a first generated image, and acquiring a reference image or a first prompt word associated with the area to be adjusted as first generated information; generating an adjustment image based on the first generation information; and adjusting the region to be adjusted by using the adjustment image to obtain a second generated image.

To solve the above technical problem, a second aspect of the present application provides an image generating apparatus, including: the device comprises a first acquisition module, a second acquisition module, a generation module and an adjustment module, wherein the first acquisition module is used for acquiring a first generated image; responding to an image editing instruction of an area to be adjusted corresponding to the first generated image, wherein the second acquisition module is used for acquiring a reference image or a first prompt word associated with the area to be adjusted as first generated information; the generation module is used for generating an adjustment image based on the first generation information; the adjusting module is used for adjusting the area to be adjusted by utilizing the adjusting image to obtain a second generated image.

To solve the above technical problem, a third aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, where the processor is configured to execute program instructions stored in the memory, so as to implement the image generating method described in the first aspect.

In order to solve the above technical problem, a fourth aspect of the present application provides a computer-readable storage medium having stored thereon program instructions which, when executed by a processor, implement the image generating method described in the first aspect.

According to the scheme, after the first generated image is obtained, the reference image or the first prompt word related to the area to be adjusted is obtained based on the image editing instruction and is used as the first generated information to generate the adjustment image, the area to be adjusted is adjusted by using the adjustment image to obtain the second generated image, the generated base map can be edited for the second time, so that the image can be more fit with the requirement of a user, and compared with the case of repeatedly attempting to regenerate the image, the image meeting the requirement of the user can be generated efficiently, and the image generation efficiency is improved.

Drawings

FIG. 1 is a flow chart of an embodiment of an image generation method of the present application;

FIG. 2 is a flow chart of another embodiment of the image generation method of the present application;

FIG. 3 is a flow chart of a further embodiment of the image generation method of the present application;

FIG. 4 is a flowchart illustrating the step S310 according to another embodiment of the present application;

FIG. 5 is a flowchart illustrating a step S310 according to another embodiment of the present application;

FIG. 6 is a flowchart illustrating a step S310 according to another embodiment of the present application;

FIG. 7 is a flow chart of yet another embodiment of the image generation method of the present application;

FIG. 8 is a schematic diagram of a frame of an embodiment of an image generating apparatus of the present application;

FIG. 9 is a schematic diagram of a frame of an embodiment of an electronic device of the present application;

FIG. 10 is a schematic diagram of a frame of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

In order to make the objects, technical solutions and effects of the present application clearer and more specific, the present application will be described in further detail below with reference to the accompanying drawings and examples. In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of an image generating method according to the present application. Specifically, the method may comprise the steps of:

step S110: a first generated image is acquired.

The first generated image may be generated by an image generation model.

Step S120: and responding to an image editing instruction of an area to be adjusted corresponding to the first generated image, and acquiring a reference image or a first prompt word associated with the area to be adjusted as first generated information.

The region to be adjusted corresponding to the first generated image may include a region in the first generated image, or may also include a region outside the first generated image.

In a specific application scenario, the region to be adjusted corresponding to the first generated image is a partial region in the first generated image.

In a specific application scenario, the region to be adjusted corresponding to the first generated image is a region outside the first generated image.

The image editing instruction is used for indicating the adjustment of the area to be adjusted, and the image editing instruction can be generated according to the operation of a user.

In some embodiments, the image editing instructions include any of a new instruction, a delete instruction, a redraw instruction, a palette instruction, and a morphing instruction. Wherein the shading instructions and morphing instructions may include a variety of instruction categories. The new addition instruction may indicate adding an image based on the first generated image, for example, adding a new object. The deletion instruction may indicate deletion of images of certain areas in the first generated image, for example deletion of an object in the first generated image. The redraw instruction may indicate that pixels of certain regions in the first generated image are modified, e.g., original pixels of a certain region are deleted, and a new image is generated and added to the region. The palette instructions may represent color adjustments to the first generated image and may include a plurality of instruction categories, such as brightness adjustment, saturation adjustment, contrast adjustment, hue adjustment, and the like. The deformation instructions may represent deformations of certain region images in the first generated image and may include a plurality of instruction categories, such as position deformations, shape deformations, and posture deformations, for example, adjustments to position, shape, and posture of certain objects in the first generated image. Further examples are to shift a tree from the left side of the image to the right side of the image, to zoom in a tree, etc., as if a person with a low head were deformed to be head up.

The method comprises the steps of taking a reference image or a first prompt word as first generation information, and generating an adjustment image for adjusting an area to be adjusted.

In an implementation scenario, in the case where the image editing instruction is a new instruction, the area to be adjusted may be an area within the first generated image, or may also be an area outside the first generated image, or may also include an area within a part of the first generated image and an area outside the first generated image. The new addition instruction may indicate that the image is added to the region to be adjusted.

In an implementation scenario, in a case where the image editing instruction is a deletion instruction, the region to be adjusted may be a region within the first generated image. The deletion instruction does not merely indicate deletion of the pixels in the region to be adjusted, but also indicates refilling of the region to be adjusted. The deletion instruction may indicate deletion of the original image in the area to be adjusted and refilling of the area to be adjusted.

In an implementation scenario, in the case of the image editing instruction as a redrawing instruction, the region to be adjusted may be a region within the first generated image. The redrawing instruction indicates that the original image in the area to be adjusted is deleted, and the image of the area to be adjusted is redrawn.

It should be noted that, the difference between the deletion instruction and the redrawing instruction is that the deletion instruction may fill the region to be adjusted according to the pixel values around the region to be adjusted in the first generated image. The redrawing instruction can redraw the image of the area to be adjusted independently of the first generated image. For example, the first generated image includes a lawn, the lawn has a puppy thereon, the region to be adjusted may be a region where the puppy is located, and the deletion instruction may be used to instruct to delete the puppy and fill the region to be adjusted with pixels of the first generated image, so that the first generated image becomes that there is no dog on the lawn. And redraw instructions may be used to instruct redrawing the area where the puppy is located, such as drawing a cat so that the first generated image becomes a cat on the lawn.

In an implementation scenario, in a case where the image editing instruction is a toning instruction, the region to be adjusted may be a region within the first generated image. The color matching instruction indicates that the original image of the area to be adjusted is color-adjusted, for example, brightness, saturation, contrast, hue, and the like are adjusted.

In an implementation scenario, in the case that the image editing instruction is a deformation instruction, the to-be-adjusted area may include a first adjustment sub-area and a second adjustment sub-area, where the first adjustment sub-area may be used to obtain an adjusted image, the first adjustment sub-area may be an area in the first generated image, the first adjustment sub-area and the second adjustment sub-area may be used as an area for adjusting the adjusted image, and the second adjustment sub-area may be an area in the first generated image or an area outside the first generated image.

In some embodiments, the image editing instruction may not be limited to the above-mentioned new instruction, delete instruction, redraw instruction, palette instruction and morphing instruction, and the corresponding editing instruction may be set according to the editing requirement of the user on the first generated image.

Step S130: an adjustment image is generated based on the first generation information.

The first generation information may be a reference image associated with the region to be adjusted, or a first prompting word. Specifically, generating the adjustment image based on the first generation information may be generating the adjustment image based on the reference image or generating the adjustment image based on the first hint word. The reference image associated with the region to be adjusted may be an image within the region to be adjusted, or an adjusted image within a preset range of the region to be adjusted.

Step S140: and adjusting the region to be adjusted by using the adjustment image to obtain a second generated image.

The manner in which the adjustment image is used to adjust the region to be adjusted may be varied. For example, the adjustment image is fused with the first generated image according to the region to be adjusted, and the image of the region to be adjusted is replaced with the adjustment image. Further, the manner of adjusting the region to be adjusted by using the adjustment image can be related to the category of the image editing instruction, so as to realize different forms of editing of the region to be adjusted.

According to the scheme, after the first generated image is obtained, the reference image or the first prompt word related to the area to be adjusted is obtained based on the image editing instruction and is used as the first generated information to generate the adjustment image, the area to be adjusted is adjusted by the adjustment image to obtain the second generated image, the generated base map can be edited for the second time, so that the image can be more attached to the user demand, the image design and manufacturing demands of diversified scenes are met, compared with the case of repeatedly attempting to regenerate the image, the image meeting the user demand can be generated efficiently, and the image generation efficiency is improved.

Furthermore, the device supports adjustment modes such as adding, deleting, redrawing, color mixing and deformation, and the like, can conveniently carry out multi-form adjustment on the first generated image so as to meet various modification requirements of a user on the first generated image, improves the flexibility of image generation, enables the generated image to be more fit with the user requirement through modification, and improves the efficiency of image generation.

Referring to fig. 2, fig. 2 is a flowchart illustrating an image generating method according to another embodiment of the application. Specifically, the method may comprise the steps of:

Step S210: a first generated image is acquired.

The first generated image may be a base map directly obtained by using the image generation model. Specifically, the image generation model generates an image based on the second generation information, and obtains a first generation image.

Step S220: and responding to an image editing instruction of a region to be adjusted corresponding to the first generated image, and acquiring a first prompt word as first generated information.

The image editing instruction may be used to instruct to independently generate the adjustment image independent of the first generated image, that is, the first generated information and the first generated image are independent of each other, and the adjustment image is used to adjust the first generated image.

In this embodiment, the image editing instruction may be a new instruction or a redrawing instruction, the first generated information includes a first prompt word, and the first prompt word is obtained based on user input.

Further, the first prompt word may be obtained based on information input by the user, and the form of the information input by the user may be various, for example, voice, text, image, and the like.

In an implementation scenario, a plurality of words input by a user are obtained as first prompt words and serve as first generation information.

Step S230: an adjustment image is generated based on the first generation information.

In this embodiment, the first generation information includes a first hint word.

Further, generating the adjustment image based on the first prompt word may specifically include: an adjustment image matching the first cue word is generated using the image generation model. The image generation model may be a text-generated graph model, such as a stable-diffusion model v2 version, a midjourn model, a cog view model, and the like.

In some embodiments, the shape, size of the adjustment image may be matched to the area to be adjusted. In some embodiments, the shape and size of the adjustment image are not limited.

Step S240: and adjusting the region to be adjusted by using the adjustment image to obtain a second generated image.

Wherein the adjustment image may be used to be added to the area to be adjusted to obtain a second generated image.

In a specific application scenario, the image editing instruction is a new instruction, and the adjustment of the area to be adjusted by using the adjustment image can be achieved by fusing the adjustment image with the first generated image according to the area to be adjusted. The area to be adjusted may include an area within the first generated image or an area outside the second generated image. Further, since the adjustment image is generated by using the image generation model alone, before the adjustment image is fused with the first generation image according to the region to be adjusted, the device can adjust the size and shape of the adjustment image according to the region to be adjusted, so as to facilitate the fusion.

In a specific application scenario, the image editing instruction is a redrawing instruction, and the adjustment of the region to be adjusted by using the adjustment image may be implemented by replacing the image in the region to be adjusted by using the adjustment image. Further, since the redrawing instruction can instruct redrawing of the area to be adjusted, the shape and the size of the adjusted image can be consistent with those of the area to be adjusted, so that the device can delete the original image of the area to be adjusted in the first generated image, and fuse the adjusted image with the first generated image after the deleting operation, so that the image in the area to be adjusted is replaced by the additionally drawn adjusted image, and redrawing of the area to be adjusted is completed.

In some embodiments, the original image of the region to be adjusted may be directly covered by the adjustment image, so as to complete redrawing of the region to be adjusted.

In some embodiments, when the image editing instruction is a new instruction, at least a portion of the image editing instruction may be selected from the first prompt word to generate an adjustment image, so as to obtain a second generated image.

Referring to fig. 3, fig. 3 is a flowchart illustrating an image generating method according to another embodiment of the application. Specifically, the method may comprise the steps of:

Step S310: a first generated image is acquired.

In some embodiments, before at least one of the reference image and the first hint word associated with the area to be adjusted is acquired as the first generation information in response to an image editing instruction for the area to be adjusted corresponding to the first generation image, the device may determine the area to be adjusted according to a user operation.

In an implementation scenario, the device may perform semantic segmentation on the first generated image to obtain a semantic segmentation result, where the semantic segmentation result includes a plurality of first sub-regions obtained by segmenting the first generated image, and uses other regions except for the first generated image as the second sub-regions, and in response to a selection operation of a user, the selected region is used as the region to be adjusted. The first sub-area and the second sub-area can be selected by a user, so that the diversity of the user selection is increased, the convenience of the user selection is improved, for example, the selected second sub-area is used as an area to be adjusted in response to the selection operation of selecting the second sub-area, and the selected at least one first sub-area is used as the area to be adjusted in response to the selection operation of selecting the at least one first sub-area.

In an implementation scenario, the device may provide a user with a custom selection area, for example, providing the user with a painting tool, a selection frame with a preset shape, and the like, for the user to select the area to be adjusted.

In an implementation scenario, the device may determine the region to be adjusted according to an editing operation of a user, for example, the user selects a first sub-region obtained by a certain semantic segmentation, and performs a deforming operation on the first sub-region, so that the device may determine that the image editing instruction is a deforming instruction, and the first sub-region is the region to be adjusted, and the deformed first sub-region is also used as the region to be adjusted.

The above embodiment is merely an example, and the determination manner of the area to be adjusted may be set according to actual needs.

In a specific application scenario, a user selects a first sub-region obtained by semantic segmentation and drags the first sub-region to a designated region, and the device can determine that an image editing instruction is a position deformation instruction, wherein the selected first sub-region is a first adjustment sub-region used for generating an adjustment image, and the selected first sub-region and the designated region are second adjustment sub-regions used for adjustment. The device can obtain an adjustment image by using the image in the first sub-region for filling the designated region, generate the adjustment image by using the image in the preset range of the first sub-region for filling the first sub-region, thereby realizing the position deformation of the original first sub-region image, filling the original first sub-region by using surrounding pixels, and maintaining the integrity of the first generated image.

In a specific application scene, a user selects a first sub-region obtained by semantic segmentation, and drags an image in the first sub-region to perform equal-proportion amplification, the position of the amplified image is a designated region, and the device can determine that an image editing instruction is a shape deformation instruction. The first selected sub-area is a first adjustment sub-area and is used for generating an adjustment image, and the first selected sub-area and the designated area are second adjustment sub-areas and are used for adjustment. The device may obtain an adjustment image for filling into the specified area by using the image in the first sub-area after the image in the first sub-area is subjected to the magnification processing, and generate the adjustment image for filling into the first sub-area by using the image in the preset range of the first sub-area. Further, there may be an overlap between the designated area where the image after the equal-proportion magnification is located and the first sub-area, and then an adjustment image is generated by using the image within the preset range of the first sub-area to be used for filling the portion of the first sub-area that does not belong to the designated area. For example, the designated area covers the first sub-area, so that the first sub-area does not need to be refilled after the designated area has been adjusted, and an intersection exists between the designated area and the first sub-area, so that the intersection part does not need to be refilled after the designated area has been adjusted.

In a specific application scene, a text can be input by clicking any region of the image, the text is described by a newly added content prompt word, then corresponding image content is generated according to a text-generated graph model, and then image fusion is automatically carried out to embed the text into a target image. The user clicking on the region may be used to determine the region to be adjusted to which the adjustment image is added, or may also be used in combination with semantic information to determine the region to be adjusted.

Step S320: and responding to an image editing instruction of the region to be adjusted corresponding to the first generated image, and acquiring a reference image associated with the region to be adjusted as first generated information.

Wherein the image editing instructions may be used to instruct generation of an adjustment image based on a reference image associated with the region to be adjusted, the adjustment image being utilized for adjustment of the first generated image, wherein the reference image associated with the region to be adjusted may be from the first generated image.

In this embodiment, the image editing instruction may be a toning instruction, a morphing instruction, or a deleting instruction, and the first generation information includes a reference image associated with the region to be adjusted. The reference image may include pixels of the region to be adjusted or pixels within a preset range from the region to be adjusted.

In one implementation, the image editing instructions may be shading instructions, the first generated information includes a reference image, and the reference image includes pixels within the region to be adjusted.

In an implementation scenario, the image editing instruction may be a morphing instruction, the first generation information includes a reference image, and the reference image includes pixels in a region to be adjusted, further, the region to be adjusted includes a first adjustment sub-region, and the reference image includes pixels in the first adjustment sub-region.

In an implementation scenario, the image editing instruction may be a deletion instruction, the first generation information includes a reference image, and the reference image includes pixels within a preset range from the region to be adjusted.

Step S330: an adjustment image is generated based on the first generation information.

In this embodiment, the palette instruction may include multiple instruction categories, with different instruction categories representing adjustments in different dimensions. For example, the palette instructions may include four instruction categories, brightness adjustment, saturation adjustment, contrast adjustment, and hue adjustment, representing adjustments from the dimensions of brightness, saturation, contrast, and hue, respectively. The morphing instructions may include a variety of instruction categories, with different instruction categories representing adjustments in different dimensions. For example, the deformation instructions may include three instruction categories of position deformation, shape deformation, and attitude deformation, respectively adjusted from the dimensions of position, shape, and attitude.

In an implementation scenario, the first generation information includes a reference image, where the image editing instruction may be a toning instruction or a morphing instruction, and the reference image includes pixels of an area to be adjusted, and generating the adjustment image based on the first generation information may specifically include: and adjusting pixels contained in the reference image based on the current instruction category to obtain an adjusted image. The operation of adjusting the pixels contained in the reference image corresponds to the current instruction category.

Further, when the image editing instruction is a color matching instruction, the color adjustment is performed on the pixels in the area to be adjusted, and the shape and size of the adjusted image can be matched with the area to be adjusted according to the area to be adjusted when the adjusted image is generated.

In a specific application scenario, the image editing instruction is a brightness adjustment instruction, the reference image includes pixels in the region to be adjusted, and adjusting the pixels included in the reference image may be performed to adjust the brightness of the pixels included in the reference image, so as to obtain an adjusted image.

In a specific application scenario, the image editing instruction is saturation adjustment, the reference image includes pixels in the region to be adjusted, and adjusting the pixels included in the reference image may be performed to adjust the saturation of the pixels included in the reference image, so as to obtain an adjusted image.

Further, when the image editing instruction is a morphing instruction, it means morphing based on pixels in the region to be adjusted. The shape and size of the adjustment image may also be determined based on the current instruction category. In particular, for the position deformation instruction, the shape, size of the adjustment image may coincide with the first adjustment sub-area of the area to be adjusted. For the shape deformation instruction and the gesture deformation instruction, the shape and the size of the adjustment image are matched with specific transformation requirements in the instruction, and are not necessarily consistent with the area to be adjusted.

In a specific application scenario, the image editing instruction is a shape deformation instruction, the reference image includes pixels in the region to be adjusted, and adjusting the pixels included in the reference image may be to adjust the shape of the pixels included in the reference image. Further, the region to be adjusted includes a first adjustment sub-region, the reference image includes pixels in the first adjustment sub-region, and adjusting the pixels included in the reference image may be to adjust the shape of the pixels in the first adjustment sub-region. The shape adjustment may also include various forms, such as equal scale up, width increase, affine transformation, and the like.

In a specific application scenario, the image editing instruction is a position deformation instruction, the reference image includes pixels in the region to be adjusted, and adjusting the pixels included in the reference image may be to adjust the positions of the pixels included in the reference image. Further, the region to be adjusted includes a first adjustment sub-region, the reference image includes pixels in the first adjustment sub-region, and adjusting the pixels included in the reference image may be to adjust positions of the pixels in the first adjustment sub-region. The adjustment image may now coincide with the reference image.

Note that the shading instructions or morphing instructions may include instructions of at least one instruction class. For example, the toning instructions may include brightness adjustments and saturation adjustments such that generating an adjusted image based on the reference image may include brightness and saturation adjustments for pixels included in the reference image.

In an implementation scenario, the first generation information includes a reference image, the image editing instruction may be a deletion instruction, the reference image includes pixels within a preset range from the region to be adjusted, and generating the adjustment image based on the first generation information may specifically include: an adjustment image matching the region to be adjusted is generated based on pixels included in the reference image. The preset range can be adjusted according to actual needs. The shape and the size of the adjustment image are matched with those of the area to be adjusted.

In a specific application scenario, the area to be adjusted is an area where the puppy on the lawn is located, and the reference image is a pixel within a preset range of the area where the puppy is located, specifically, a lawn image around the area where the puppy is located. The generating of the adjustment image matched with the region to be adjusted based on the pixels included in the reference image may specifically be generating an adjustment image consistent with the shape and the size of the puppy by using pixels in a preset range of the region where the puppy is located, so as to obtain a lawn image not including the puppy.

Step S340: and adjusting the region to be adjusted by using the adjustment image to obtain a second generated image.

The manner in which the region to be adjusted is adjusted using the adjustment image may be various, and the adjustment manner may be related to the category of the image editing instruction.

In a specific application scenario, the image editing instruction is a deletion instruction, and the adjustment of the region to be adjusted by using the adjustment image may be achieved by replacing the image in the region to be adjusted by using the adjustment image. Further, the deleting instruction can instruct to delete the original image content of the area to be adjusted, and in order to keep the integrity of the first generated image, the shape and the size of the adjusted image can be consistent with those of the area to be adjusted, so that the device can delete the original image of the area to be adjusted in the first generated image, and fuse the adjusted image with the first generated image after the deleting operation, thereby realizing the deletion of the original image of the area to be adjusted, and simultaneously keeping the integrity of the first generated image.

In a specific application scenario, the image editing instruction is a toning instruction, and the adjustment of the region to be adjusted by using the adjustment image can be achieved by replacing the image in the region to be adjusted by using the adjustment image. Further, the color adjustment instruction can instruct the image in the area to be adjusted to be color-adjusted, and the shape and the size of the adjusted image can be consistent with those of the area to be adjusted, so that the device can delete the original image of the area to be adjusted in the first generated image, and fuse the adjusted image with the first generated image after the deleting operation, thereby realizing the replacement of the original image by the adjusted image after color adjustment and realizing the color adjustment of the area to be adjusted.

In some embodiments, the image editing instruction is a color adjustment instruction, the adjustment image may be used to overlap with an image pixel of the area to be adjusted to implement color adjustment, and adjusting the area to be adjusted by using the adjustment image may also be implemented by fusing the adjustment image with the first generated image according to the area to be adjusted.

In some embodiments, the image editing instruction is a palette instruction or a morphing instruction, and the category of the palette instruction or the morphing instruction is not limited to the above example, and the manner of generating the adjustment image may be various. For example, the generation of the adjustment image may be based on the current instruction category, and semantic information of the reference image may be combined, for example, when the character is subjected to gesture deformation, the adjustment image conforming to the semantic information and the gesture deformation may be generated in combination with the semantic information of the character.

Referring to fig. 4, fig. 4 is a flowchart illustrating a step S310 according to another embodiment of the present application. Specifically, step S310 may include the steps of:

step S411: input information is acquired.

Wherein the input information is obtained based on user operation. The input information may be in various forms, for example, the input information includes at least one of input voice, input text and input image, and of course, the input information may be not limited to the form of voice, text and image, may be in other forms, and may be set according to the needs of the user, so as to implement processing of the multimodal information.

In a specific application scenario, a user uploads an image to generate an image corresponding to the uploaded image, and the device may acquire the image as an input image.

In a specific application scenario, a user inputs a voice to generate an image corresponding to the voice, and the device may acquire the voice as the input voice.

In a specific application scenario, a user enters speech and text to generate images corresponding to the speech and text.

In some embodiments, the input image may be an image captured by the device or an image drawn by the user.

In a specific application scenario, the image collected by the device may be collected by the image collecting device, and the image drawn by the user may be a sketch sketched by the user.

In some embodiments, the input information comprises input text, and extracting the second generated information from the input information comprises: and extracting a plurality of text keywords from the input text, and taking the text keywords as second prompt words.

In some embodiments, the input information comprises input speech, and the device performs speech recognition on the input speech to obtain the recognized text before extracting the second generated information from the input information, and updates the input text with the recognized text. And then extracting a plurality of text keywords from the input text, and taking the text keywords as second prompt words.

In an implementation scenario, the input text is directly embodied as a plurality of text keywords, and then the text keywords can be directly used as second prompt words.

In a specific application scenario, the user directly inputs the following text of lovely, smiling, boy, and takes the text keyword as a second prompt word.

In a specific application scenario, input voice is acquired, voice recognition is performed on the input voice by the device, so that a recognition text of lovely, smiling and boy is obtained, and the text keyword is used as a second prompt word.

In one implementation scenario, the input text is a text paragraph, and then several text keywords are extracted from the text paragraph, and the text keywords are used as second prompt words.

In a specific application scenario, a user directly inputs the following text, "i want to generate a landscape of summer beach, a group of children chase together, sea waves gently beat on feet of children, smiles of children are as bright as the sun", and the following text keywords "summer, beach, landscape, children, chase, sea waves, feet, smiles, sun, bright" are extracted from the input text.

It should be noted that extracting text keywords may be implemented using a large language model, for example, a LLaMA model, a ChatGLM-6B model, a PaLM-E model, and the like.

Step S412: second generation information is extracted from the input information.

The second generation information is used for indicating the image generation model to generate the first generation image, and the second generation information comprises at least one of a second prompt word and input image information.

It should be noted that, the input voice and the input text may be used to extract the second prompt word, and the input image may be used to extract the second prompt word and the input image information. Further, the manner of extracting the second generation information based on the input image is various, and an appropriate extraction manner may be selected according to actual needs.

Step S413: a first generated image matching the second generated information is generated using the image generation model.

Wherein the image generation model is used for generating the first generated image based on the second generated information, and the image generation model can be selected according to the form of the second generated information, for example, a stable-diffusion model, a generation countermeasure network (GauGAN), and the like.

According to the scheme, the device can process the multi-mode information to automatically generate the image, so that the device can adapt to different forms of input, the flexibility and convenience of image generation are improved, and the image design and manufacturing requirements of diversified scenes are met.

Referring to fig. 5, fig. 5 is a flowchart illustrating a step S310 according to another embodiment of the present application. Specifically, step S310 may include the steps of:

step S511: input information is acquired.

In this embodiment, the input information includes an input image, and the input image is an image acquired by the apparatus. Extracting the second generation information from the input information may include step S512-step S513 to extract the second generation information based on the input image.

Step S512: performing scene recognition on the input image to obtain scene categories as first alternative prompting words, performing semantic segmentation on the input image to obtain a plurality of object categories as second alternative prompting words, and performing keyword analysis on the input image to obtain image keywords as third alternative prompting words.

Wherein scene recognition, semantic segmentation, and keyword analysis may be implemented using a correlation model. For example, keyword analysis may be implemented using an img2 sample-related model or tool. Scene recognition obtains the scene category to which the input image belongs, and the scene category is used as a first alternative prompting word. And performing semantic segmentation to obtain object categories of all objects in the input image, wherein the object categories are used as second alternative prompting words. Keyword analysis results in image keywords of the input image, which may represent features of the input image, such as positional relationships between objects in the image, and the like.

Step S513: and responding to the designated similarity, and selecting the second prompting word from the first prompting word alternative, the second prompting word alternative and the third prompting word alternative based on a similarity interval where the designated similarity exists.

In this embodiment, the second prompting word may be extracted from the input image as the second generation information by the above extraction method. The specified similarity is used for indicating the similarity between the input image and the first generated image, and the number of the second prompting words is positively correlated with the numerical value of the specified similarity. The specified similarity may be predetermined in response to a user operation.

In a specific application scenario, the device may allow the user to select a specified similarity between the first generated image and the input image, for example, the specified similarity may be from 0.1 to 1.

The optional range of the second prompting word is related to a similarity interval where the designated similarity is located. Further, when the specified similarity is less than or equal to the first threshold, a second cue word is selected from the first candidate cue words. When the designated similarity is between the first threshold value and the second threshold value, the first alternative prompting words are used as second prompting words, and a plurality of second alternative prompting words are selected to be used as second prompting words together with the first alternative prompting words, and the number of the selected second alternative prompting words is in direct proportion to the designated similarity. When the appointed similarity is larger than a second threshold value, the first alternative prompting words and the second alternative prompting words are used as second prompting words, a plurality of third alternative prompting words are selected to be used as second prompting words together with the first alternative prompting words and all the second alternative prompting words, and the number of the selected third alternative prompting words is in direct proportion to the appointed similarity.

By the method, the first generated image can be generated by adapting to the user specified similarity, the flexibility of image generation is improved, the image with the similarity meeting the user requirement can be generated more accurately, and the efficiency of image generation is improved.

Step S514: a first generated image matching the second generated information is generated using the image generation model.

The image generation model may be a text-to-image model for generating a first generated image that matches the second hint word.

It should be noted that, in the foregoing embodiment, the step of obtaining the second hint may be referred to for obtaining the first hint.

Referring to fig. 6, fig. 6 is a flowchart illustrating a step S310 according to another embodiment of the present application. Specifically, step S310 may include the steps of:

step S611: input information is acquired.

In this embodiment, the input information includes an input image, and the input image is an image acquired by the apparatus. Extracting the second generation information from the input information may include step S612 to extract the second generation information based on the input image.

Step S612: and carrying out keyword analysis on the input image to obtain an image keyword as a second prompting word, and obtaining characteristic information of an object in the input image as input image information.

Wherein, the keyword analysis can refer to the related description in the previous embodiment. In particular, the feature information of the object in the input image may comprise at least one of an object contour, a depth, a corner point, a key point, a normal of the three-dimensional model. The feature information of the object of the input image may be extracted based on the input image, and further, the feature information of the object of the input image may be in the form of a mask map.

In a specific application scenario, the control net model may be used to extract feature information of the object from the input image, which is used as the input image information, to be used as the second generation information.

Step S613: a first generated image matching the second generated information is generated using the image generation model.

In a specific application scenario, a stable-diffusion model may be utilized to generate a first generated image based on the input image information and the second hint word.

In some embodiments, when the input information includes an input image and the input image is an image drawn by a user, extracting the second generation information from the input information may include performing semantic analysis on the input image to obtain image semantics as the input image information. The input image may be a sketch drawn by the user, where the sketch includes an object outline outlined by the user.

In a specific application scene, the generating countermeasure network is utilized to generate and color a real object according to an image drawn by a user to obtain a second generated image, and further, the generating countermeasure network can perform semantic analysis on an input image to obtain image semantics for generating and coloring the real object.

In some embodiments, the input information may also include both input text and input image, or both input speech and input image, or both input text, input speech, and input image.

It should be noted that, in the embodiments of the present application, solutions that do not have technical contradictions may be combined, and the manner of extracting the second generation information from the input information in the foregoing embodiments may also be combined.

In a specific application scenario, the device obtains a hint word for a given hint, such as "lovely, smile, boy", and the device generates a base map using the stable-diffusion model v2 version.

In a specific application scenario, a text is converted into a prompt word list based on a large language model, for example, "I want to generate a landscape of summer beach, a group of children chase together, sea waves lightly beat on feet of children, smiles of children are as bright as burning sun", and the prompt word list generated after processing the large language model is "summer, beach, landscape, children, chase, sea waves, feet, smiles, burning sun and bright".

In a specific application scenario, a base map similar to reference map elements (objects, styles, hues, compositions, etc.) is automatically generated from the image input by the user. Further, assuming that the cue word list finally used for generating the image is PN, N represents the number of total cue words, automatically generating a cue word list KM corresponding to the image according to an img2prompt related model or tool, and M represents the number of total list words. Based on the image scene recognition, acquiring scene category S of the image, simultaneously executing object segmentation and detection (SAM), recording all object categories O _J J represents the total number of objects. The device may obtain the similarity (from 0.1 to 1) between the generated graph and the reference graph selected by the user, where the higher the similarity is, the higher the richness of the PN prompt word list is, and the specific setting mode is as follows: similarity 0.1 represents pn= [ S ], and only scene categories are used as prompt words and are used as inputs for generating a base; similarity 0.5 represents pn= [ S, O ₁ ，…，O _J Taking the scene category and the object category as input, and assuming that the similarity is s, the number j=max { (s-0.1) of objects put into PN is J/0.4, 1}; similarity 1 represents pn= [ S, O ₁ ，…，O _J , K ₁ ，…，K _M The scene category, the object category and the img2prompt words are taken as input, and the number m=max { (s-0.5) ×M/0.5, 1} of KM prompt words put into PN is assumed to be s. And generating a base map directly according to a text-to-document model v2 version after the prompt word list PN is provided. The image generated by the method is high in richness, and the generated base map is low in similarity with the input image.

In a specific application scenario, a base map that is consistent with the input image contour, depth, corner points, key points, normals of the corresponding three-dimensional model, etc. is generated from the input image. Further, feature information of an object in the input image is extracted as input image information by a ControlNet model, and specifically includes at least one of the above-mentioned contour, depth, corner points, key points, normals of corresponding three-dimensional models, and the like. And automatically generating a prompt word list KM corresponding to the image by using a model or tool related to the input image information and img2prompt as second generation information, and generating a base map by using a stable diffusion model. If a portrait is generated, the effect generated by the method can be consistent with the gender, the hairstyle, the statue, the body posture and the like of the person in the input image, and the base map generated by the method has very high similarity with the input image.

In a specific application scene, the device acquires an image drawn by a user by self, a system automatically generates a real object according to the content and colors the real object, the similarity of the image generated in the mode is low, the imagination is rich, and the content is possibly abstract. Specifically, the device may obtain a sketch sketched by the user, automatically generate a real object according to the content based on a generation antagonism network (GauGAN), and color the real object to generate a base drawing.

Referring to fig. 7, fig. 7 is a flowchart of an image generating method according to another embodiment of the application. Specifically, the method may comprise the steps of:

step S710: a first generated image is acquired.

Step S720: and responding to an image editing instruction of an area to be adjusted corresponding to the first generated image, and acquiring a reference image or a first prompt word associated with the area to be adjusted as first generated information.

Step S730: an adjustment image is generated based on the first generation information.

Step S740: and adjusting the region to be adjusted by using the adjustment image to obtain a second generated image.

The descriptions of steps S710 to S740 may refer to the descriptions of the foregoing embodiments, and are not described herein.

Step S750: the second generated image is updated to the first generated image.

The second generated image is obtained by adjusting the second generated image based on the original first generated image, and after the second generated image is obtained, the device can update the first generated image by using the second generated image, namely, update the original first generated image by using the adjusted first generated image.

It should be noted that the device may provide the user with an image editing page for editing the base map after the base map is generated. The above steps S720 to S750 may be regarded as one edit of the first generated image, the above editing step may be performed several times, and after updating the second generated image to the first generated image, the steps S720 to S750 may be re-performed in response to acquiring a new image editing instruction, and the first generated image may be updated again.

Step S760: in response to the obtained image direct output instruction, the current first generated image is taken as a target generated image.

The target generated image is a final output result of the current image generation, and the image direct output instruction is used for instructing the device to output the current first generated image as the target generated image.

Step S770: and responding to the obtained image conversion output instruction, and converting the current first generated image based on the image conversion output instruction to obtain a target generated image.

The image conversion output instruction is used for instructing the device to convert the current first generated image, and the conversion result is output as a target generated image. The image conversion output instruction includes at least one of a style conversion instruction and a super resolution reconstruction instruction.

The conversion operation is shown as post-processing, and the content (existing elements, composition, etc.) of the image is not changed.

In an implementation scenario, the style conversion instruction is used to perform style migration on the image, and the device may store a plurality of preset styles in advance for the user to select, for example, a 2D cartoon, a 3D effect, a CG writing effect, a real scene, and the like.

In a specific application scenario, a style migration module may be pre-stored in the device, and configured to perform style migration on the image, so as to obtain a conversion result in the style selected by the user, and generate the image as a target. The style migration module may be a CycleGAN model or the like.

In an implementation scene, the super-resolution reconstruction instruction is used for performing super-resolution reconstruction on the image, so that the resolution of the image is improved, and the displayed image can be clearer. Further, the apparatus may determine the target resolution according to a user operation, and the super-resolution reconstruction instruction may instruct reconstruction of the image to the target resolution.

The steps S760, S770, and the return to the execution steps S720 to S750 are executed according to a user instruction. Wherein, one of step S760 and step S770 is performed.

After the base map is generated in the mode, the base map can be post-processed according to the user needs, and the style and the definition of the base map can be adjusted to generate the digital image meeting the user needs.

According to the scheme, after the base map is generated, the base map is edited in various forms in real time according to user operation, so that the base map is adjusted, the requirements of users are met, input is not required to be readjusted, regeneration is performed, the image generation step is simplified, the image can be generated and edited only by giving a simple instruction to the users in the whole process, and even if users without painting bases can easily finish the production of high-quality images.

Referring to fig. 8, fig. 8 is a schematic diagram of an image generating apparatus according to an embodiment of the application.

In the present embodiment, the image generating apparatus 80 includes: the device comprises a first acquisition module 81, a second acquisition module 82, a generation module 83 and an adjustment module 84, wherein the first acquisition module 81 is used for acquiring a first generated image; in response to an image editing instruction for an area to be adjusted corresponding to the first generated image, the second obtaining module 82 is configured to obtain a reference image or a first prompt word associated with the area to be adjusted as first generated information; the generation module 83 is configured to generate an adjustment image based on the first generation information; the adjustment module 84 is configured to adjust the area to be adjusted by using the adjustment image, so as to obtain a second generated image.

The image editing instructions comprise any one of a new instruction, a deleting instruction, a redrawing instruction, a toning instruction and a deforming instruction, and the toning instruction and the deforming instruction comprise various instruction categories.

When the image editing instruction is a new instruction or a redrawing instruction, the first generation information comprises a first prompt word, and the first prompt word is input by a user. The generating module 83 includes a first generating sub-module for generating an adjustment image matching the first hint word using the image generating model.

Wherein, in the case that the image editing instruction is a shading instruction or a morphing instruction, the first generation information includes a reference image, and the reference image includes pixels within the region to be adjusted. The generating module 83 includes a second generating sub-module, which is configured to adjust pixels included in the reference image based on the current instruction class, to obtain an adjusted image.

Wherein, in the case that the image editing instruction is a deletion instruction, the first generation information includes a reference image, and the reference image includes pixels within a preset range from the region to be adjusted. The generating module 83 includes a third generating sub-module, which is configured to generate an adjustment image that matches the region to be adjusted based on pixels included in the reference image.

The image generating device 80 further includes a segmentation module, configured to perform semantic segmentation on the first generated image to obtain a semantic segmentation result before obtaining at least one of a reference image and a first prompt associated with the region to be adjusted as first generated information; and responding to a selection operation of selecting the second subarea or selecting at least one first subarea, and taking the selected area as an area to be adjusted, wherein the semantic segmentation result comprises a plurality of first subareas obtained by segmenting the first generated image, and other areas except the first generated image are taken as the second subarea.

The first obtaining module 81 includes an obtaining sub-module, an extracting sub-module, and a fourth generating sub-module, where the obtaining sub-module is used to obtain input information, and the extracting sub-module is used to extract second generating information from the input information; the input information comprises at least one of input voice, input text and input image, the second generation information is used for indicating the image generation model to generate a first generation image, and the second generation information comprises at least one of second prompt words and input image information; the fourth generation submodule is used for generating a first generated image matched with the second generated information by using the image generation model.

The image generating device 80 further includes a recognition module, where the input information includes an input voice, for performing voice recognition on the input voice to obtain a recognition text before extracting the second generation information from the input information, and updating the input text with the recognition text. The extraction submodule comprises a keyword extraction unit which is used for extracting a plurality of text keywords from the input text and taking the text keywords as second prompt words when the input information comprises the input text.

The extraction sub-module comprises an alternative extraction unit and a selection unit, wherein the alternative extraction unit is used for carrying out scene recognition on an input image to obtain scene categories as first alternative prompting words, carrying out semantic segmentation on the input image to obtain a plurality of object categories as second alternative prompting words, and carrying out keyword analysis on the input image to obtain image keywords as third alternative prompting words when the input information comprises the input image and the input image is an image acquired by equipment; in response to obtaining the specified similarity, the selection unit is used for selecting the second prompting word from the first prompting word alternative, the second prompting word alternative and the third prompting word alternative based on a similarity interval in which the specified similarity is located; the specified similarity is used for indicating the similarity between the input image and the first generated image, and the number of the second prompting words is positively correlated with the numerical value of the specified similarity.

The extraction sub-module comprises a first analysis unit, wherein the first analysis unit is used for carrying out keyword analysis on the input image to obtain image keywords as second prompt words and obtaining characteristic information of objects in the input image as input image information under the condition that the input information comprises the input image and the input image is an image acquired by equipment.

The extraction sub-module comprises a second analysis unit, and is used for carrying out semantic analysis on the input image to obtain image semantics as input image information when the input information comprises the input image and the input image is an image drawn by a user.

The image generating device 80 further includes an updating module, configured to update the second generated image to the first generated image after adjusting the area to be adjusted by using the adjustment image to obtain the second generated image;

the image generating apparatus 80 further includes an editing module for re-executing the step of acquiring the reference image or the first hint word associated with the area to be adjusted as the first generation information in response to the acquisition of the image editing instruction; the image generating apparatus 80 further includes an output module for generating an image with the current first generated image as a target in response to obtaining the image direct output instruction; the image generation device further comprises a conversion output module, which is used for converting the current first generated image based on the image conversion output instruction to obtain a target generated image in response to the image conversion output instruction; wherein the image conversion output instruction includes at least one of a style conversion instruction and a super resolution reconstruction instruction.

Referring to fig. 9, fig. 9 is a schematic diagram of a frame of an electronic device according to an embodiment of the application.

In this embodiment, the electronic device 90 includes a memory 91 and a processor 92, wherein the memory 91 is coupled to the processor 92. In particular, various components of the electronic device 90 may be coupled together by a bus, or the processor 92 of the electronic device 90 may be coupled to each other individually. The electronic device 90 may be any device having processing capabilities, such as a computer, tablet, cell phone, or the like.

The memory 91 is used for storing program instructions executed by the processor 92, data during processing by the processor 92, and the like. For example, a first generated image, a second generated image, etc. Wherein the memory 91 comprises a non-volatile storage part for storing the above-mentioned program instructions.

The processor 92 controls the operation of the electronic device 90, the processor 92 may also be referred to as a CPU (Central Processing Unit ). The processor 92 may be an integrated circuit chip with signal processing capabilities. Processor 92 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 92 may be commonly implemented by a plurality of circuit-forming chips.

The processor 92 is operative to execute instructions to implement any of the image generation methods described above by invoking program instructions stored in the memory 91.

Referring to fig. 10, fig. 10 is a schematic diagram of a frame of an embodiment of a computer readable storage medium according to the present application.

In this embodiment, the computer readable storage medium 100 stores a program instruction 101 executable by a processor, where the program instruction 101 is capable of being executed to implement any of the above-mentioned image generating methods.

The computer readable storage medium 100 may be a medium such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disc, which may store program instructions, or may be a server storing the program instructions, where the server may send the stored program instructions to another device for execution, or may also self-execute the stored program instructions.

In some embodiments, the computer readable storage medium 100 may also be a memory as shown in FIG. 9.

The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the present application.

Claims

1. An image generation method, comprising:

acquiring a first generated image;

responding to an image editing instruction of an area to be adjusted corresponding to the first generated image, and acquiring a reference image or a first prompt word associated with the area to be adjusted as first generated information;

generating an adjustment image based on the first generation information;

adjusting the region to be adjusted by using the adjustment image to obtain a second generated image;

the first generation image is generated by an image generation model based on second generation information, the second generation information is extracted from input information, and when the input information comprises the input image and the input image is an image acquired by equipment, the second generation information is extracted from the input information, and the method comprises the following steps:

performing scene recognition on the input image to obtain scene categories as first alternative prompting words, performing semantic segmentation on the input image to obtain a plurality of object categories as second alternative prompting words, and performing keyword analysis on the input image to obtain image keywords as third alternative prompting words;

In response to obtaining a specified similarity, selecting a second prompting word as the second generation information from the first prompting word alternative, the second prompting word alternative and the third prompting word alternative based on a similarity interval in which the specified similarity exists; the specified similarity is used for indicating the similarity between the input image and the first generated image, and the number of the second prompting words is positively correlated with the numerical value of the specified similarity.

2. The method of claim 1, wherein the image editing instructions comprise any of a new instruction, a delete instruction, a redraw instruction, a palette instruction, and a warp instruction, and the palette instruction and the warp instruction comprise a plurality of instruction categories.

3. The method according to claim 2, wherein in the case where the image editing instruction is the new instruction or the redrawing instruction, the first generated information includes the first hint word, and the first hint word is input by a user;

the generating an adjustment image based on the first generation information includes:

and generating the adjustment image matched with the first prompt word by using an image generation model.

4. The method according to claim 2, wherein in the case where the image editing instruction is the toning instruction or the morphing instruction, the first generation information includes the reference image, and the reference image includes pixels within the region to be adjusted;

and adjusting pixels contained in the reference image based on the current instruction category to obtain the adjusted image.

5. The method according to claim 2, wherein in the case where the image editing instruction is the deletion instruction, the first generation information includes the reference image, and the reference image includes pixels within a preset range from the region to be adjusted;

and generating the adjustment image matched with the area to be adjusted based on the pixels contained in the reference image.

6. The method according to claim 1, wherein, in response to an image editing instruction for an area to be adjusted corresponding to the first generated image, acquiring a reference image or a first hint word associated with the area to be adjusted as the first generated information, the method further comprises:

Carrying out semantic segmentation on the first generated image to obtain a semantic segmentation result; the semantic segmentation result comprises a plurality of first subareas obtained by segmentation of the first generated image, and other areas except the first generated image are used as second subareas;

and responding to a selection operation of selecting the second subarea or selecting at least one first subarea, and taking the selected area as the area to be adjusted.

7. The method of claim 1, wherein the acquiring the first generated image comprises:

acquiring input information, and extracting second generation information from the input information; wherein the input information comprises at least one of input voice, input text and input image, the second generation information is used for indicating an image generation model to generate the first generation image, and the second generation information comprises at least one of second prompt words and input image information;

generating the first generated image matched with the second generated information by using an image generation model.

8. The method of claim 7, wherein, in the case where the input information includes the input speech, before the extracting second generation information from the input information, the method further comprises:

Performing voice recognition on the input voice to obtain a recognition text, and updating the input text by using the recognition text;

in the case where the input information includes the input text, the extracting second generation information from the input information includes:

and extracting a plurality of text keywords from the input text, and taking the text keywords as the second prompt words.

9. The method of claim 7, wherein, in the case where the input information includes an input image and the input image is an image acquired by a device, the extracting second generation information from the input information includes:

and carrying out keyword analysis on the input image to obtain an image keyword as the second prompting word, and obtaining characteristic information of an object in the input image as the input image information.

10. The method of claim 7, wherein, in the case where the input information includes an input image and the input image is an image drawn by a user, the extracting second generation information from the input information includes:

and carrying out semantic analysis on the input image to obtain image semantics as the input image information.

11. The method of claim 1, wherein after adjusting the region to be adjusted using the adjustment image to obtain a second generated image, the method further comprises:

updating the second generated image to the first generated image;

in response to obtaining the image editing instruction, re-executing the step of obtaining a reference image or a first prompt word associated with the region to be adjusted as first generation information;

responding to an obtained image direct output instruction, and taking the current first generated image as a target generated image;

responding to an obtained image conversion output instruction, and converting the current first generated image based on the image conversion output instruction to obtain a target generated image; wherein the image conversion output instruction includes at least one of a style conversion instruction and a super resolution reconstruction instruction.

12. An image generation apparatus, characterized in that the image generation apparatus comprises:

the first acquisition module is used for acquiring a first generated image;

the second acquisition module is used for acquiring a reference image or a first prompt word associated with the region to be adjusted as first generation information in response to an image editing instruction of the region to be adjusted corresponding to the first generation image;

A generation module for generating an adjustment image based on the first generation information;

the adjusting module is used for adjusting the region to be adjusted by utilizing the adjusting image to obtain a second generated image; the first generation image is obtained by generating an image based on second generation information by an image generation model, the second generation information is extracted from input information, the first acquisition module comprises an extraction submodule, the extraction submodule comprises an alternative extraction unit and a selection unit, the alternative extraction unit is used for carrying out scene recognition on the image acquired by equipment to obtain scene types serving as first alternative prompting words, carrying out semantic segmentation on the image acquired by the equipment to obtain a plurality of object types serving as second alternative prompting words, and carrying out keyword analysis on the image acquired by the equipment to obtain image keywords serving as third alternative prompting words when the input information comprises the input image and the input image is the image acquired by the equipment; in response to obtaining the specified similarity, the selection unit is configured to select, based on a similarity interval in which the specified similarity is located, a second prompting word among the first prompting word candidate, the second prompting word candidate, and the third prompting word candidate, as the second generation information; the specified similarity is used for indicating the similarity between the image acquired by the equipment and the first generated image, and the number of the second prompting words is positively correlated with the numerical value of the specified similarity.

13. An electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the image generation method of any one of claims 1 to 11.

14. A computer readable storage medium having stored thereon program instructions, which when executed by a processor, implement the image generation method of any of claims 1 to 11.