CN117649461A

CN117649461A - Interactive image generation method and system based on space layout and use method thereof

Info

Publication number: CN117649461A
Application number: CN202410115444.3A
Authority: CN
Inventors: 魏家富; 杨溪
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2024-01-29
Filing date: 2024-01-29
Publication date: 2024-03-05
Anticipated expiration: 2044-01-29
Also published as: CN117649461B

Abstract

The invention discloses an interactive image generation method and system based on space layout and a using method thereof, wherein the interactive image generation method and system based on space layout comprises the following steps: importing a target image and a reference image; wherein, the target image is: the original image of the task to be generated by the user is the reference image: an image for providing a feature attribute for a target picture; transferring the characteristic attribute of the reference image to the target image, and adjusting the influence weight of the characteristic attribute of the reference image on the target image by controlling the distance between the target image and the reference image; based on the adjusted target image and the reference image, a new target image is generated. The invention can easily adjust the image position to control the generated result, and enhance visual interaction and efficiency. This approach allows users to better control the results and increase their degrees of freedom of authoring.

Description

Interactive image generation method and system based on space layout and use method thereof

Technical Field

The invention belongs to the technical field of picture generation, and particularly relates to an interactive image generation method and system based on spatial layout and a use method thereof.

Background

Although the ability to generate a countermeasure network (GAN) is impressive, controlling the style and content of its generated image remains a challenge. Active control of the image style and content generated by GAN is critical to meet specific requirements in practical applications. This has led to the advent of GAN image generation tools that enable users to express their creativity and ideas. In addition, personalized generation has gained acceptance, yielding various tools aimed at meeting the user's wishes.

However, existing tools still lack control flexibility and user friendliness. For example, sketch-based tools typically require a user to have specific drawing skills or image editing experience. In addition, the resulting image often fails to achieve the desired level of realism. Slider-based tools offer limited options, ignoring diverse user requirements and creative expressions. Text-based tools rely on abstract inputs, resulting in a gap between the user's expectations and the results obtained. In addition, some tools have visual efficiency problems such as interface confusion, complex operation, and the like. As users increasingly seek freedom and personalization of generation tools, traditional tool modes have not been adequate to meet the needs of a particular scenario.

Disclosure of Invention

The invention aims to provide a novel tool specially designed for flexible image generation, and the main aim of the novel tool is to provide a multifunctional image generation platform which surpasses the traditional control function, through innovative 2D layout design, a user can easily adjust the image position to control the generated result, visual interaction and efficiency are enhanced, the method enables the user to better control the result, the degree of freedom of creation of the user is improved, in addition, the tool integrates the real world image as a reference, the user can use the attribute of the existing image to guide the generation of a target image, the result is expected, creative exploration and experiment can be stimulated, and firm user and content connection is promoted.

To achieve the above object, the present invention provides a spatial layout-based interactive image generation method, comprising:

importing a target image and a reference image; wherein, the target image is: the user wants to generate the original image of the task, and the reference image is: an image for providing a feature attribute for a target picture;

transferring the characteristic attribute of the reference image to the target image, and adjusting the influence weight of the characteristic attribute of the reference image on the target image by controlling the distance between the target image and the reference image;

and generating a new target image according to the final weight.

Optionally, the characteristic attribute of the reference image includes: local and global properties;

the local attributes include: eyes, nose, mouth and hair;

the global attributes include: dressing, age, face shape and head orientation.

Optionally, transferring the local attribute of the reference image to the target image comprises:

providing a corresponding mask for each reference image by adopting a mask preprocessing method; when one local attribute is selected from the reference image, identifying a mask corresponding to the selected local attribute; combining the identified mask with the reference image to extract regions of the local attribute;

adding the extracted region to the target image to create a new input image; processing the input image and the target image using a pre-trained encoder and generating two corresponding potential vectors Code t and Code _i ；

Potential vectors Code t and Code are added by weighting _i And inputting the local attribute into a pre-trained image generator to generate a transmission result of the local attribute.

Optionally, transferring the global attribute of the reference image to the target image comprises:

selecting a base image from a pre-collected set of image data that is aligned with the reference image; wherein, the basic image is: the aligned basic image only has one attribute difference from the reference image, and the basic image is used for extracting the global feature of the reference image;

inputting the basic image, the target image and the reference image into a pre-trained encoder to obtain potential vectors Code t, code r and Code b;

extracting a representation of an attribute related to the global attribute by subtracting Code b from Code r, and adding Code t to Code r to generate new potential parameters;

the new potential parameters are input into a pre-trained image generator to generate a single global attribute transmission image.

Optionally, the method for adjusting the influence weight of the characteristic attribute of the reference image on the target image by controlling the distance between the target image and the reference image comprises the following steps:

wherein,as the distance weight is used for the distance,kis a constant value, and is used for the treatment of the skin,taris the (x, y) coordinates of the target image,refi(x, y) coordinates for the ith reference image,/->For the result of the picture that is finally generated,Gthe device for generating the electric field comprises a generator,nfor the number of reference pictures to be used,Code _i is the potential vector of the i-th reference image.

To achieve the above object, the present invention also provides an interactive image generation system based on spatial layout, including: the device comprises an importing module, an adjusting module and a generating module;

the importing module is used for importing a target image and a reference image; wherein, the target image is: the user wants to generate the original image of the task, and the reference image is: an image for providing a feature attribute for a target picture;

the adjusting module is used for transferring the characteristic attribute of the reference image to the target image and adjusting the influence weight of the characteristic attribute of the reference image on the target image by controlling the distance between the target image and the reference image;

the generation module is used for generating a new target image based on the adjusted target image and the reference image.

the local attributes include: eyes, nose, mouth and hair;

the global attributes include: dressing, age, face shape and head orientation.

To achieve the above object, the present invention further provides a method for using an interactive image generation system based on spatial layout, including:

importing a target picture and a reference picture;

and selecting the characteristic attribute of the reference picture, and controlling the generation effect by adjusting the distance between the target picture and the reference picture.

The invention has the following beneficial effects:

the invention leads in the target image and the reference image; transferring the characteristic attribute of the reference image to the target image, and adjusting the influence weight of the characteristic attribute of the reference image on the target image by controlling the distance between the target image and the reference image; the user can easily adjust the image position to control the generated result, and the intuitive interaction and efficiency are enhanced. This approach allows users to better control the results and increase their degrees of freedom of authoring. Furthermore, the tool integrates real world images as a reference, enabling a user to use the properties of existing images to guide the generation of target images. This not only aids in the outcome expectations, but also motivates creative exploration and experimentation, thereby facilitating a more secure user to content contact.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:

FIG. 1 is a schematic flow chart of an interactive image generation method based on spatial layout according to an embodiment of the invention;

FIG. 2 is a schematic diagram of an interface design for generating pictures according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a local attribute transfer according to an embodiment of the present invention;

FIG. 4 is a global attribute transfer diagram of an embodiment of the present invention;

FIG. 5 is a diagram of an imported image according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a picture import to a user workspace according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating an attribute selection box according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of controlling the intensity of a generated effect by dragging a reference picture according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a generated picture effect according to an embodiment of the present invention;

FIG. 10 is a mask exemplary diagram of an embodiment of the present invention;

fig. 11 is a schematic diagram illustrating the distinction between a base image and a reference image according to an embodiment of the present invention.

Detailed Description

It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

Example 1

As shown in fig. 1, the present embodiment provides a spatial layout-based interactive image generation method, including:

importing a target image and a reference image; wherein, the target image is: the original image of the task to be generated by the user is the reference image: an image for providing a feature attribute for a target picture;

based on the adjusted target image and the reference image, a new target image is generated.

Further, the characteristic attributes of the reference image include: local and global properties;

the local attributes include: eyes, nose, mouth and hair;

the global attributes include: dressing, age, face shape and head orientation.

Further, transferring the local attribute of the reference image to the target image includes:

providing a corresponding mask for each reference image by adopting a mask preprocessing method; when a local attribute is selected from the reference image, the tool automatically identifies the related mask; the mask is combined with the reference image to extract regions of local attribute;

adding the extracted region to the target image to create a new input image; processing an input image and a target image using a pre-trained encoder and generating two corresponding potential parameters Code t and Code _i ；

Potential parameters Code t and Code are added by weighting _i Input to a pre-trained image generator, and a transmission result of the local attribute is generated.

Further, transferring the global attribute of the reference image to the target image includes:

selecting a base image from a pre-collected dataset that is aligned with the reference image;

inputting the basic image, the target image and the reference image into a pre-trained encoder to obtain potential parameters Code t, code r and Code b;

The interactive image generation method is realized based on the space layout in the embodiment. The method specifically comprises the following steps:

data preparation: collected from FFHQ dataset and head pose dataset and truncated in the result map of Image2style gan and BeautyGAN. These data are all portrait.

Preparing a model: the task of picture generation in the real world with gan requires two components, the first being a generator (typically stylegan) to generate results and the other being an encoder (a gan inversion technique) to map features of the picture in the real world to the hidden space of stylegan. In this embodiment, styleGAN2 is used as a picture generator, and e4e (encoder for editing) is used as an encoder, and the pre-training model is used for both models, so that the tool of this embodiment does not involve training of the model.

Interface design: as shown in fig. 2, the present embodiment proposes a mode of picture generation in which a target picture is provided with an attribute by importing a reference picture in the real world. For this reason, a completely new two-dimensional user interface needs to be designed. An interface to the system was written using Pyqt5, which is a novelty of this embodiment. In this two-dimensional interface, the target picture is located in the middle of the work area, and the reference picture can be arbitrarily placed in the work area by the user, wherein the reference picture is used for providing attributes for the target picture, for example, adding glasses in the reference picture to the target picture. The embodiment also designs an attribute selection box through which a user can autonomously select the attribute of the reference picture. In addition, the user can control the size of the attribute of the picture in the generated result by adjusting the position of the reference picture, the closer the reference picture is to the target picture, the stronger the attribute, and vice versa.

The user use flow: the user may import an image from the real world or a provided dataset through a reference image field as a reference image. The reference image is used to provide attributes for the generated image. At the same time, the present embodiment introduces a novel 2D spatial layout. The user imports an image (target image) to be generated by clicking on the center area of the spatial layout work area. The user can then add reference images by clicking or dragging them from the reference image column to the workspace, and also add connecting lines to display their relationship. Right clicking on the reference image triggers an attribute selection box allowing the user to select the desired attribute from the image.

Thereafter, the user may move the reference image to control the effect of the selected attribute on the generated result. The closer the reference image is to the target image, the more similar they are and vice versa. This embodiment allows the user to redo, undo, or reset adjustments in the generation process by clicking the corresponding button. Furthermore, the present embodiment changes the color and thickness of the connection line accordingly to further visualize the effect. Integrating these interactive functions into the 2D workspace greatly enhances the user friendliness and intuitiveness of the tools of the present embodiment.

The following describes the principle of algorithm in the interface, the attribute transfer describes how to add the features of the reference picture to the target picture, and the distance weight calculation describes how the user controls the generation effect of the pictures by adjusting the distance between the pictures.

And (5) transferring the attribute.

Through continuous research on existing work, this embodiment finds that existing picture attributes are generally divided into two categories, namely local attributes (e.g. glasses, nose, mouth) and global attributes (e.g. makeup, age, face shape). While global and local properties are applicable to different methods.

Wherein,local attributes are represented for local attributes.

Local properties. The present embodiment employs a mask preprocessing method to provide a corresponding mask for each reference image. Due to the standardization of GAN-based image generation, masks can be shared among most images. When the user selects an attribute from the reference image i (e.gMouth) the tool automatically identifies the associated mask. The mask is combined with the reference image to extract the mouth region. The extracted region is then added to the target image to create a new input image. Processing the input and target images using a pre-trained e4e encoder and generating two corresponding potential vectors Code t and Code _i . Finally, the weighted addition is input into a pre-trained StyleGAN2 generator to produce a single-mouth attribute transmission result. Where two corresponding potential vectors are used in the local attribute control, code t=target image potential vector,Code _i representing potential vectors for the ith reference image.

The mouth mask in fig. 3 is a mask. Specific procedure (as shown in fig. 3): the user imports the target picture and the reference picture- > the user selects a certain attribute (for example, mouth) - > of the reference picture, the system automatically selects mask- > representing the mouth, then the reference picture is combined with the mask, the position of the mouth in the reference picture- > the extracted position of the mouth is extracted to be combined with the target picture, and the fusion picture- > and the target picture are obtained through the decoder. An example of the mask is shown in fig. 10.

Wherein,global attributes, a global attribute.

Global properties. For global properties, the above algorithm is not applicable. Therefore, the present embodiment adopts an image code subtraction method. When the user selects a global attribute (e.g., makeup) in the attribute selection box, the system selects a corresponding base image from the pre-collected portrait dataset that is aligned with the reference image, as shown in fig. 11, showing the distinction between base image and reference image, which differ in only one attribute. The target image i, reference image i and base image i are then input into an e4e encoder to obtain the corresponding potential vectors Code t, code r and Code b. By subtracting Code b from Code r, a representation of the cosmetic-related attribute is extracted. Code t is then added to it to generate a new potential Code. Finally, a single global property transmission image is generated by inputting a new code into the StyleGAN2 generator. As shown in fig. 4. Wherein, code t= =target image, code r= =reference image, and Code b= base image.

The role of the base image is to extract the global features of the reference image, since the global features are not as simple as the local features are available through the mask. Therefore, only one global attribute on the base image and the reference picture corresponding thereto is different. For example, the reference image is a female who is looking up, the corresponding base image is a female of plain color, the female is identical in both images, and the expression, posture, etc. are approximately identical (gan's generator allows for slight differences). The calculation method of the present embodiment can be regarded as a simple code of the reference image-the code of the base image, so the base image is referred to as the base because it is a reduction number which is used to extract the features of the reference image.

Only one global attribute on the base image and the reference picture corresponding thereto is different, which is also a normal image.

And calculating the distance weight.

To achieve the result of adjusting the distance between the reference image and the target image to affect the generation, the tool of the present embodiment automatically calculates this distance and assigns an inverse attribute weight to the reference image. Specifically, in the two-dimensional working area, the more distant the reference picture is from the target picture, the fewer features of the reference picture will be generated, and the closer the reference picture is from the target picture, the more features of the reference picture will be generated.

Finally, for multiple reference pictures, the present embodiment performs weighted summation on the Codei generated for each reference picture and adds them to Code t.

Wherein,as the distance weight is used for the distance,kfor a constant (selected over several simple tests),taris the (x, y) coordinates of the target image,refifor the (x, y) coordinates of the ith reference image,for the result of the picture that is finally generated,Gthe device for generating the electric field comprises a generator,nfor the number of reference images,Code _i is the potential vector of the i-th reference image.

Example 2

The embodiment also provides an interactive image generation system based on space layout, which comprises: the device comprises an importing module, an adjusting module and a generating module;

the importing module is used for importing a target image and a reference image; wherein, the target image is: the original image of the task to be generated by the user is the reference image: an image for providing a feature attribute for a target picture;

and the generation module is used for generating a new target image based on the adjusted target image and the reference image.

the local attributes include: eyes, nose, mouth and hair;

the global attributes include: dressing, age, face shape and head orientation.

The workflow of the interactive image generation system based on the spatial layout proposed in the embodiment is as follows:

the two words are explained first. 1. Target picture: the original picture of the task to be generated by the user, namely the user imports the picture into the system, and the new picture is obtained on the basis of the picture. 2. Reference picture: the picture used to provide the feature for the target picture may be selected by the user if he wants to swap the target picture with another smile, and then the attribute of this smile is selected by feature selection. Typically selected from the real world by the user.

The user first selects a target picture and a reference picture (the reference picture is imported into the reference picture column) (as shown in fig. 5).

Next, the user may click on the corresponding picture in the reference picture field to import the corresponding picture into the user workspace. Fig. 6 shows four reference pictures being imported, note that the four reference pictures are draggable. In addition, when a reference picture is imported into the user's workspace, a weight line is automatically generated between the reference picture and the target picture that changes color and thickness according to the distance between the reference picture and the target picture, which color and thickness show the weight of the reference picture in subsequent calculations.

At the same time, the user may click on a button on the right side of the workspace, which is called a generate button. Clicking the button will produce the corresponding result.

Further, the present embodiment provides the user with an attribute selection box (as shown in fig. 7). The user clicks on a reference picture by right-clicking on it. The user may generate a picture in which the person is more aged than the original by selecting whether certain attributes in the reference picture appear, e.g., the age (age) attribute of only the picture is currently selected. The currently designed attribute selection frame has 8 attributes, namely eyes, nose, mouth, hair, age, face, head orientation and makeup.

In addition, the user can drag the reference picture by clicking a left button of the mouse. The user can realize control of the intensity of the generated effect by dragging the reference picture, as shown in fig. 8.

The specific realization of the functions:

the tool is mainly written using PyQt5 and matplotlib. QT designer in PyQt5 used by the user interface of the tool. The generation network used by this tool is stylegan2, while in order to be able to use pictures in the real world for generation tasks, a gan conversion technique is required, this embodiment uses e4e (an encoder).

Definition of reference pictures: defined using QGraphicsPixmapItem, class is specifically named class GraphicItem (QGraphicsPixmapItem). This class has two most important functions, one is def mouseMoveEvent (self, event), by which the dragging of the reference picture can be achieved. The other is def mousePressEvent (self, event), which defines two user operations, a middle key and a right key click/drag, respectively, corresponding to deletion and appearance of the property selection box.

For the read picture function (import target picture and reference picture): the pictures and paths of pictures are read using the qfiiedialog dialog.

For the mobile function: since def mouseMoveEvent (self, event) is defined in the definition of the reference picture, the user can drag the reference picture through the left mouse button.

For the feature selection box: the feature selection box was designed with PyQt5 in combination with QDialog. For each reference picture, there is a list, in which 8 variables are stored, corresponding to 8 attributes, and initially, all the attributes have values of true, and when the user clicks (cancels) the attribute, the corresponding variable is changed from true to false, that is, the attribute of the reference picture does not participate in the final generation effect. And vice versa.

For the weight line: this function was done using qgraphics lineite. First, the distance between the reference picture and the target picture is calculated, and then the thickness and color of the weight line are adjusted, wherein the thickness and color are inversely proportional to the distance, and the thicker the distance is, the darker the line is, the farther the distance is, the thinner the line is, and the lighter the color is.

The picture generation process comprises the following steps: all the reference pictures and the target pictures pass through an encoder (e 4 e) to generate corresponding codes, the codes are added according to a certain weight according to the selection result of the feature selection frame, and the adding calculation method of the codes refers to attribute transfer and distance weight calculation. Then a final code is obtained, and the code is passed through a generator (stylegan 2) to generate a picture with a corresponding effect.

Example 3

The embodiment provides a using method of an interactive image generating system based on space layout, which comprises the following steps:

importing a target picture and a reference picture;

Step profile used by the user:

1. the user selects a target picture (original picture intended for the generation task) and a reference picture (picture providing the feature).

2. The user places the reference picture in the middle user workspace.

3. The user can adjust the distance (reference distance weight calculation) and the attribute (reference attribute transfer) of the reference picture to control the generated effect.

4. The iteration is continued until the desired result is obtained.

Experiments and user researches prove that the image generation tool of the embodiment is superior to the existing tool in the aspects of degree of freedom, easiness in achieving the target, result satisfaction, user use intention and the like. As shown in fig. 9, an exemplary view of the effect is shown.

The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for generating an interactive image based on a spatial layout, comprising:

the method for adjusting the influence weight of the characteristic attribute of the reference image on the target image by controlling the distance between the target image and the reference image comprises the following steps:

wherein (1)>As the distance weight is used for the distance,kis a constant value, and is used for the treatment of the skin,taris the (x, y) coordinates of the target image,refi(x, y) coordinates for the ith reference image,/->For the result of the picture that is finally generated,Gthe device for generating the electric field comprises a generator,nfor the number of reference pictures to be used,Code _i is the potential vector of the ith reference image;

and generating a new target image according to the final weight.

2. The interactive image generation method based on spatial layout according to claim 1, wherein the characteristic attributes of the reference image include: local and global properties;

the local attributes include: eyes, nose, mouth and hair;

the global attributes include: dressing, age, face shape and head orientation.

3. The spatial layout-based interactive image generation method according to claim 2 wherein transferring the local attribute of the reference image to the target image comprises:

4. The spatial layout-based interactive image generation method according to claim 2 wherein transferring global properties of the reference image to the target image comprises:

selecting a base image from a pre-collected set of image data that is aligned with the reference image; wherein, the basic image is: the aligned basic image only has one global attribute difference from the reference image, and the basic image is used for extracting global features of the reference image;

5. A spatial layout-based interactive image generation system for implementing the spatial layout-based interactive image generation method of any of claims 1-4, the system comprising: the device comprises an importing module, an adjusting module and a generating module;

6. The interactive image generation system based on spatial layout according to claim 5 wherein the characteristic attributes of the reference image comprise: local and global properties;

the local attributes include: eyes, nose, mouth and hair;

the global attributes include: dressing, age, face shape and head orientation.

7. A method of using a spatial layout based interactive image generation system, wherein the spatial layout based interactive image generation system of any of claims 5-6 is applied, the method of using comprising:

importing a target picture and a reference picture;