CN114119348A

CN114119348A - Image generation method, apparatus and storage medium

Info

Publication number: CN114119348A
Application number: CN202111162116.1A
Authority: CN
Inventors: 李智康; 周慧玲; 白帅; 周畅; 杨红霞; 周靖人
Original assignee: Alibaba Cloud Computing Beijing Co Ltd
Current assignee: Alibaba Cloud Computing Beijing Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2022-03-01

Abstract

The embodiment of the invention provides an image generation method, equipment and a storage medium, wherein the method comprises the following steps: aligning the basic object image and the reference object image, and acquiring a design point region of the reference object image and a non-design point region of the basic object image; extracting a first group of style vectors corresponding to the basic object image; extracting a second group of style vectors according to the design point region of the reference object image and the non-design point region of the basic object image; and generating a fused image based on the fusion of the first group of style vectors and the second group of style vectors, wherein the fused image is combined with the structural characteristics of the design principal point region of the reference object image. Therefore, the effect of automatically and efficiently transferring the design details of the local area of the reference object to the base object is achieved.

Description

Image generation method, apparatus and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image generation method, an image generation device, and a storage medium.

Background

In areas such as apparel design, home decoration design, and the like, designers often have a need to refer to local designs in existing designs for new designs.

For example, in the process of designing a garment, a designer usually needs to use some partial designs of an existing style for reference to create a new style. However, in the process of editing the image of the new style garment design, designers refer to some local design details of the existing style and manually draw the new style design, so that some local detail features of the existing style are integrated into the new style design being drawn, large-scale batch design cannot be achieved, and editing efficiency is low.

Disclosure of Invention

The embodiment of the invention provides an image generation method, image generation equipment and a storage medium, which are used for improving the image generation efficiency.

In a first aspect, an embodiment of the present invention provides an image generation method, where the method includes:

aligning a basic object image and a reference object image, and acquiring a design point region of the reference object image and a non-design point region of the basic object image;

extracting a first group of style vectors corresponding to the basic object image;

extracting a second group of style vectors according to the design point region of the reference object image and the non-design point region of the basic object image;

and generating a fused image based on the fusion of the first group of style vectors and the second group of style vectors, wherein the fused image is combined with the structural characteristics of the design point region of the reference object image.

In a second aspect, an embodiment of the present invention provides an image generating apparatus, including:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for aligning a basic object image and a reference object image and acquiring a design point region of the reference object image and a non-design point region of the basic object image;

the extraction module is used for extracting a first group of style vectors corresponding to the basic object image; extracting a second group of style vectors according to the design point region of the reference object image and the non-design point region of the basic object image;

and the generating module is used for generating a fused image based on the fusion of the first group of style vectors and the second group of style vectors, wherein the fused image is combined with the structural characteristics of the design point region of the reference object image.

In a third aspect, an embodiment of the present invention provides an electronic device, including: the device comprises an input device, a processor and a display screen;

the input device is coupled to the processor and the display screen and used for inputting a basic object image and a reference object image;

the processor is used for aligning a basic object image and a reference object image and acquiring a design point region of the reference object image and a non-design point region of the basic object image; extracting a first group of style vectors corresponding to the basic object image; extracting a second group of style vectors according to the design point region of the reference object image and the non-design point region of the basic object image; generating a fused image based on the fusion of the first group of style vectors and the second group of style vectors, wherein the fused image is combined with the structural features of the design point region of the reference object image;

the display screen is used for displaying the basic object image, the reference object image and the fusion image.

In a fourth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to implement at least the image generation method according to the first aspect.

In a fifth aspect, an embodiment of the present invention provides an image generating method, where the method includes:

responding to a request of calling an image generation service interface by user equipment, and executing the following steps by using a processing resource corresponding to the image generation service interface:

aligning a basic object image and a reference object image, and acquiring a design point region of the reference object image and a non-design point region of the basic object image; extracting a first group of style vectors corresponding to the basic object image; extracting a second group of style vectors according to the design point region of the reference object image and the non-design point region of the basic object image; and generating a fused image based on the fusion of the first group of style vectors and the second group of style vectors, wherein the fused image is combined with the structural characteristics of the design point region of the reference object image.

In a sixth aspect, an embodiment of the present invention provides an image generating method, where the method includes:

aligning a basic object image and a reference object image, and respectively acquiring a design point region and a non-design point region of the basic object image and the reference object image;

extracting a first group of style vectors corresponding to the non-design point region in the basic object image;

extracting a second group of style vectors corresponding to the design point region in the reference object image;

In the scheme provided by the embodiment of the invention, assuming that a user wants to edit an image of a certain basic object, in the editing process, the structural features (such as shapes, outlines, patterns, folds and the like) of the design point region of the reference object are transferred to the basic object. And then extracting a first group of style vectors corresponding to the basic object image, and extracting a second group of style vectors by combining the design point region of the reference object image and the non-design point region of the basic object image, wherein the second group of style vectors comprises the style vectors of the design point region in the reference object image and the style vectors of the non-design point region in the basic object image. And finally, fusing the first group of style vectors and the second group of style vectors to generate a fused image serving as an editing result according to the fused third group of style vectors, so that the aim of automatically and efficiently transferring the structural characteristics of the design essential point region of the reference object to the basic object can be fulfilled, the finally obtained editing result is fused with the local design details of the reference object, and the original overall design style of the basic object is retained.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart of an image generation method according to an embodiment of the present invention;

FIG. 2 is a schematic illustration of a mask image;

FIG. 3a is a schematic diagram of hiding a non-design point region in a reference object image;

FIG. 3b is a schematic diagram of hiding a corresponding region in a base object image;

FIG. 4 is a schematic diagram of a style vector optimization process according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of another style vector optimization process provided by an embodiment of the invention;

FIG. 6 is a schematic diagram of a blended image;

FIG. 7 is a schematic illustration of an edited image;

fig. 8 is a schematic diagram illustrating an implementation process of an image generation method according to an embodiment of the present invention;

fig. 9 is a schematic application diagram of an image generation method according to an embodiment of the present invention;

FIG. 10 is a flowchart of an image generation method according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of an image generating apparatus according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an electronic device according to this embodiment;

fig. 13 is a schematic structural diagram of another electronic device provided in this embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

The image generation method provided by the embodiment of the invention can be executed by an electronic device, and the electronic device can be a terminal device such as a PC (personal computer), a notebook computer and the like, and can also be a server. The server may be a physical server or may also be a virtual server. The server can be a physical or virtual server at the user side, and can also be a cloud server.

The scheme provided by the embodiment of the invention can be used for generating the image of the basic object with the specific editing effect. In brief, after a user (in this document, the user may be a designer with an image design requirement) acquires an image including a basic object (referred to as a basic object image) and an image including a reference object (referred to as a reference object image), and marks a design point region that wants to refer to a design style of the reference object image, by executing a scheme provided by the embodiment of the present invention, automatic migration of structural features presented in the design point region to a corresponding region in the basic object image is realized, so that in a finally generated image, an original overall design style of the basic object is retained, and local design details of the design point region in the reference object are fused.

The visual features of an image can be divided into three types, namely a low layer, a middle layer and a high layer, wherein the low layer and the middle layer can be considered as the visual features reflecting the macro level of the image, and the high layer is the visual features reflecting the micro level of the image. The above-described structural features may correspond to features of the lower and middle layers. Taking a garment scene as an example, the low-level and medium-level visual features of the garment may include, for example: shapes, outlines, particular structures, patterns of large blocks, etc., and high-level visual features may include, for example, color, texture, etc. Therefore, the original overall design style of the basic object is retained in the image obtained by automatically editing the basic image, and the local design details of the design point region in the reference object are fused, so that in brief, the following can be understood: the visual characteristics of the edited image, such as color, texture, material and the like, are consistent with those of the basic object, and the structural characteristics of the part of the edited image, which corresponds to the design point area, are consistent with those of the design point area.

In different application scenarios, the basic object and the reference object may be different. For example, in a costume design scenario, the base object and the reference object are two costumes with different styles, such as two jeans, two T-shirts, and the like. In a home decoration design scenario, the base object and the reference object may be decoration design drawings of two living rooms, respectively.

The following embodiments are provided to describe the implementation of the image generation method provided in the present application in detail.

Fig. 1 is a flowchart of an image generating method according to an embodiment of the present invention, as shown in fig. 1, the method may include the following steps:

101. and aligning the basic object image and the reference object image, and acquiring a design point region of the reference object image and a non-design point region of the basic object image.

102. And extracting a first group of style vectors corresponding to the basic object image.

103. And extracting a second group of style vectors according to the design point region of the reference object image and the non-design point region of the basic object image.

104. And generating a fused image based on the fusion of the first group of style vectors and the second group of style vectors, wherein the fused image is combined with the structural characteristics of the design principal point region of the reference object image.

Firstly, a basic object image and a reference object image are obtained, and a design point area needing to be referred to is marked in the reference object image. The basic object image is an object which needs to be further edited to show a specific effect, and in practical application, the basic object image can be an image obtained by directly shooting the basic object or an image obtained by the independent design of a user; the reference object image may be an image containing a reference object that has been edited previously, or may be an image obtained by directly photographing the reference object. In order to facilitate image processing of the subsequent model, the size of the base object image and the size of the reference object image need to be adjusted to be uniform.

In practical applications, the basic object and the reference object may have different shapes, and in order to ensure accuracy of subsequent processing, in the image generation scheme provided in the embodiment of the present invention, the basic object image and the reference object image are aligned first, that is, the reference object image is aligned to the basic object image.

In summary, the image alignment process is: feature point detection is performed on the base object image and the reference object image, respectively, and the base object image and the reference object image are aligned based on the detected feature points.

Optionally, the process of detecting the feature point may be specifically implemented as:

respectively carrying out sparse feature point detection on the basic object image and the reference object image to obtain a plurality of feature points;

respectively carrying out edge detection on the basic object image and the reference object image to obtain the edge of the basic object and the edge of the reference object;

a plurality of feature points are sampled on the edge of the base object and the edge of the reference object, respectively.

That is, first, a certain feature point detection model (for example, a model which is trained on a data set such as depfashinov 1 and can perform sparse feature point detection) may be used to perform sparse feature point detection on the base object image and the reference object image, respectively, so as to obtain a plurality of feature points. The number of the obtained feature points is small, and the feature points can be densified to ensure the image alignment effect. The feature points may be densified by edge extraction of the image. Therefore, edge detection is performed on the base object image and the reference object image respectively to obtain the edge of the base object and the edge of the reference object, and then a plurality of feature points are sampled along the edge of the base object and the edge of the reference object, so that the feature point densification processing is completed.

It should be noted that, in a certain application scenario, the base object and the reference object are the same kind of object, such as jeans. The serial numbers of the feature points corresponding to different positions on a certain object can be preset, for example, the serial numbers of the feature points from the left top point of the trouser waist to the right top point of the trouser waist are 1-8 in sequence, the serial number of the feature point at the position of the notch is 9, the serial numbers of the feature points from top to bottom on the outer side of the trouser leg on the left side are 10-40 in sequence, and the like. Based on the numbering rule, after the plurality of feature points included in the basic object image and the reference object image are determined, the number corresponding to each feature point can be determined. The feature points having the same number on the two images are a pair of feature points having a correspondence relationship.

Then, according to the corresponding relation between the basic object image and the characteristic points on the reference object image, the transformation parameters can be determined, so that the reference object image is transformed according to the transformation parameters, and the alignment with the basic object image is realized.

By the above feature point detection and image alignment processing, even if the base object and the reference object are different in shape, the style vector fusion of the respective two images can be performed.

In order to generate an image which retains the original design style of the basic object and is integrated with the local structural features of the design principal point region of the reference object, on one hand, the basic object image needs to be subjected to style vector (also called "late vector") extraction to obtain a first group of style vectors; on the other hand, it is also necessary to extract a second group of style vectors including style vectors for the design point region of the reference object and style vectors for the non-design point region in the base object image. Since the base object image and the reference object image are images of the same size, after the design point region in the reference object image is marked, it can be understood that the image region in the base object image corresponding to the design point region can be regarded as the design point region of the base object image, and the other image region is referred to as a non-design point region in the base object image.

In practical applications, the extraction of the style vector may be performed by using a neural network model such as style-based generated network model (style GAN) or its modified neural network model. The network model may extract style vectors for multiple (e.g., 18) levels of the input image.

The base object image is input to a style GAN model, and the style vectors of the plurality of layers output by the model are called a first set of style vectors (i.e., a plurality of lattice vectors).

In fact, the input and output of the style GAN model are interchangeable, that is, after an image is input to the style GAN model, the style GAN model can extract style vectors of multiple layers of the input image, and after the style vectors of multiple layers are input to the style GAN model, the style GAN model can output a corresponding generated image. Thus, the style vector may affect the visual characteristics of the generated image. By extracting the two groups of style vectors and fusing the two groups of style vectors, a generated image for realizing the local design style migration effect can be generated based on the fused style vectors, wherein the generated image is called a fused image in the embodiment of the invention.

To extract the second group of style vectors, the design point region of the reference object image and the non-design point region of the base object image are acquired, and the acquiring step can be implemented by using a mask image corresponding to the reference object image. Upon obtaining the base object image and the reference object image, a mask image corresponding to the reference object image may be generated based on the design point region marked in the reference object image.

For ease of understanding, the meaning of the mask image is exemplarily described in connection with fig. 2. As shown in fig. 2, first, the mask image is a binarized black-and-white image, and the size of the mask image is equal to that of the reference object image. Assuming that the reference object image is an image including a pair of trousers as illustrated in the drawing, and the user selects a region of the trousers' legs as the design point region, in the mask image, a white pixel region (RGB ═ 1,1,1)) corresponds to the design point region, and the remaining black pixel region (RGB ═ 0, O,0)) corresponds to a non-design point region in the reference object image, i.e., a region other than the design point region. The mask image generation method may be implemented by referring to the related art, and is not described herein.

The obtaining of the design point region of the reference object image and the non-design point region of the base object image may be specifically a step of obtaining, based on the mask image, a first image containing the design point region of the reference object image and a second image containing the non-design point region of the base object image. Thereafter, a second set of style vectors is extracted based on the first image and the second image.

Here, the sizes of the first image and the second image are equal to the sizes of the reference object image and the base object image, and in brief, the first image corresponds to an image formed by hiding the non-design point region in the reference object image and exposing only the design point region, and the second image corresponds to an image formed by hiding the design point region in the base object image and exposing only the non-design point region.

Specifically, the first image containing the design point region of the reference object image may be acquired by multiplying the mask image by the reference object image to hide the non-design point region of the reference object image. By multiplying the exclusive or result of the mask image with the base object image, the design point region of the base object image is hidden, thereby acquiring a second image containing the non-design point region of the base object image. Wherein the mask image has the same size as the reference object image and the base object image.

The mask image is multiplied by the reference object image by multiplying RGB values of corresponding pixels in the two images. Since only the white image region where RGB is (1,1,1) and the black image region where RGB is (0,0,0) are included in the mask image, the result of multiplying the reference object image by the mask image is: an image area (i.e., a design point area) corresponding to the white image area in the reference object image remains unchanged, but image areas (i.e., non-design point areas) corresponding to the black image area in the reference object image are all set to black, thereby achieving the purpose of hiding the non-design point area in the reference object image and exposing only the design point area, as shown in fig. 3 a.

If the mask image is denoted as M, the exclusive or result of the mask image can be expressed as: 1-M. In short, the exclusive or result of the mask image is equivalent to subtracting with 1 from the RGB value of each pixel, and the result is: the pixels in the mask image that are originally white are set to black, and the pixels that are originally black are set to white.

The process of multiplying the xor result of the mask image by the basic object image refers to the process of multiplying the mask image by the reference object image, which is not described herein. The multiplication result is: the pixels in the design point region in the base object image are all set to black, and the pixels in the non-design point region are all set to white, so that the purpose of hiding the design point region and only exposing the non-design point region is achieved, as shown in fig. 3 b.

The process of extracting the second set of style vectors based on the generated first and second images may be implemented as:

and (3) iteratively executing the following optimization process of the second group of style vectors until a second group of style vectors meeting the requirement is obtained:

inputting the first image and the mixed image into a classification model to determine a first loss function value through the classification model; the mixed image is an image generated based on a second group of style vectors to be optimized, and a loss function corresponding to the classification model is used for measuring the perception similarity of the two input images;

inputting a second image and the blended image into the classification model to determine a second loss function value by the classification model;

the first loss function value and the second loss function value are combined to optimize a second set of style vectors.

As can be seen from the above description, in the embodiment of the present invention, the second set of style vectors needs to be continuously optimized to obtain the second set of style vectors that finally satisfy the condition. Initially, a second set of style vectors may be initialized to a plurality of random vector values.

For ease of understanding, the optimization process for the second set of style vectors described above is illustrated in conjunction with FIG. 4.

The first round of iterative process: initially, a set of style vectors (e.g., 18 style vectors) may be randomly initialized, and a generated network model illustrated in the figure, such as the styleGAN model mentioned above, is input to output an image, called a blended image, through the generated network model. Then, the first image and the mixed image are input as a pair into a classification model, which is used for determining whether the two input images are of the same class, and in the case of the current input of the mixed image and the first image, the classification model is used for determining whether the mixed image is of the same class as the first image. The way to measure whether two input images are of the same class is: and calculating the perceptual similarity between the two input images, and if the perceptual similarity between the two input images is high, considering the two input images as the same class. Therefore, the corresponding loss function of the classification model is used to measure the perceptual similarity of the two input images. In fig. 4, it is assumed that the loss function value between the blended image and the first image output by the classification model is loss 1. Similarly, the second image and the mixed image are also input as a pair to the classification model, and in fig. 4, it is assumed that the loss function value between the mixed image and the second image output by the classification model is loss 2. The sum of the two loss function values is denoted as the total loss, based on which the second set of style vectors is optimized by a back propagation process.

The process of the second round of iteration: and inputting the second group of style vectors optimized by the first iteration process into a generated network model to obtain a second mixed image. Similarly, the first image and the second blended image are input into a classification model as a pair of inputs, the classification model outputs corresponding loss function values, which are assumed to be loss1 ', the second image and the second blended image are input into the classification model as a pair of inputs, the classification model outputs corresponding loss function values, which are assumed to be loss 2', the sum of the two loss function values is assumed to be total loss ', and the second set of style vectors is further optimized by a back propagation process based on the total loss'.

And so on. And assuming that the total loss function value output by the iteration is smaller than a set threshold value when the iteration is executed to the nth iteration, considering that the iteration is ended, and taking the second group of style vectors used by the nth iteration as the second group of style vectors obtained by final optimization for a subsequent style vector fusion process.

In practical applications, the classification model may be, for example, a VGG model, but not limited to this. The penalty function of the classification model may for example employ LPIPS penalties that measure a high level of perceptual consistency.

In an alternative embodiment, to further improve the quality of the second set of style vectors, two different types of penalty functions may be used for the constraints, including a penalty that measures the above-mentioned high-level perceptual consistency, and a per-pixel mean square error penalty that measures the low-level texture consistency.

Based on this, the process of extracting the second set of style vectors based on the first image and the second image may also be implemented as:

comparing the pixels of the first image and the blended image to determine a third loss function value; and, performing a pixel comparison of the second image and the blended image to determine a fourth loss function value;

the second set of style vectors is optimized in combination with the first, second, third, and fourth loss function values.

For ease of understanding, the optimization process for the second set of style vectors described above is illustrated in conjunction with FIG. 5.

The first round of iterative process: initially, a set of style vectors (e.g., 18 style vectors) may be randomly initialized, a generated network model illustrated in the figure is input, and an image, called a blended image, is output through the generated network model. Then, the first image and the mixed image are input to the classification model as a pair of inputs, and it is assumed that the loss function value between the mixed image output from the classification model and the first image is loss 1. Similarly, the second image and the mixed image are also input as a pair to the classification model, and the loss function value between the mixed image and the second image output by the classification model is assumed to be loss 2. The first image and the mixed image are input as a pair to a pixel comparison module, which compares the two input images pixel by pixel and determines a corresponding loss function value based on the set loss function and the pixel comparison result, which is denoted as loss 3. Similarly, the second image and the blended image are input to the pixel contrast module as a pair, assuming that the loss function value output by the pixel contrast module is loss 4. The sum of the four loss function values is denoted as the total loss, based on which the second set of style vectors is optimized by a back propagation process.

The process of the second round of iteration: and inputting the second group of style vectors optimized by the first iteration process into a generated network model to obtain a second mixed image. Similarly, the first image and the second blended image are input into the classification model as a pair of inputs, the classification model outputs the corresponding loss function value, which is assumed to be loss1 ', and the second image and the second blended image are input into the classification model as a pair of inputs, the classification model outputs the corresponding loss function value, which is assumed to be loss 2'. The first image and the second blended image are also input as a pair to the pixel contrast module, assuming that the loss function value output by the pixel contrast module is loss 3'. Similarly, the second image and the second blended image are input as a pair to the pixel contrast module, assuming that the loss function value output by the pixel contrast module is loss 4'. The result of the summation of the four loss function values for this round is denoted as the total loss', based on which the second set of style vectors is further optimized by a back propagation process.

It can be understood that, in the process of continuously optimizing the second group of style vectors, it is to be ensured that the mixed image generated based on the optimized second group of style vectors can be more similar to the first image and the second image, the first image exposes the design point region in the reference object image, and the second image exposes the part of the non-design point region in the basic object image. And the mixed image generated based on the optimized second group of style vectors has the following characteristics: a part of the image regions exhibit a stylistic characteristic that is very similar to the non-design point region in the base object image, while another part of the image regions exhibit a stylistic characteristic that is very similar to the design point region in the reference object image. For ease of understanding, the description is exemplified in conjunction with fig. 6.

Assuming that the base object is a pair of trousers of blue vertical stripe pattern and the reference object is a pair of trousers of black lattice pattern, as shown in fig. 6, the style characteristics of the design point region of the reference object still appear in the image region corresponding to the design point region in the reference object map in the mixed image generated based on the optimized second set of style vectors: a black lattice; and in the remaining image regions, style characteristics of the non-design point region in the base object image are presented: blue vertical stripes.

After obtaining the optimized second group of style vectors containing the style vectors of the design point region in the reference object image and the style vectors of the non-design point region in the basic object image and obtaining the first group of style vectors obtained by extracting the style characteristics of the basic object image, the first group of style vectors and the second group of style vectors can be fused to generate a fused image serving as an editing result of the basic object image according to the fused third group of style vectors, so that the visual effect of transferring the structural characteristics of the design point region of the reference object to the corresponding position on the basic object can be presented in the fused image.

Specifically, as described above, a set of style vectors is composed of a plurality of style vectors, such as style vectors of 18 levels in the first set of style vectors and the second set of style vectors, respectively. In the process of fusion, the fusion levels and the fusion proportions of the multi-layer style vectors respectively contained in the first group of style vectors and the second group of style vectors can be determined, and then the style vectors of the corresponding fusion levels are fused according to the fusion proportions.

In practical application, a large number of experiments can be performed in advance to determine the fusion level and the fusion proportion which accord with the current application scene. The fusion hierarchy is a hierarchy in which the style vectors of all the multiple hierarchies are fused, and may be all or part of the hierarchy. The fusion proportion is simply the fusion weight, i.e. the weight value used when the style vectors of a certain level in the first set of style vectors and the style vectors of the corresponding level in the second set of style vectors are fused together. In the testing stage or the training stage, the fusion level and the fusion proportion can be continuously adjusted until the determined fusion level and fusion proportion can enable the finally produced image to present the visual effect which is in line with the expectation. The finally generated image is an image generated based on a third group of style vectors obtained by fusing the two groups of style vectors by using the determined fusion level and fusion proportion.

It can be understood that, for a certain application scenario, the tested fusion level and fusion proportion may be applicable to different input situations in the application scenario, for example, in a design scenario of jeans, two jeans serving as a basic object and a reference object input this time may adopt the same fusion proportion and fusion level as another two jeans serving as a basic object and a reference object input next time. Therefore, the fusion level and the fusion proportion can be set in advance for different application scenes, and the adopted fusion level and the adopted fusion proportion can be selected for the current application scene.

For ease of understanding, the process of fusing the style vectors and the images generated based on the fused style vectors are illustrated in connection with fig. 7.

In fig. 7, still assuming that the base object and the reference object are the situation illustrated in fig. 6, as shown in fig. 7, the first set of style vectors and the second set of style vectors are fused based on the determined fusion ratio and fusion hierarchy, and the fusion result is referred to as a third set of style vectors. The third group of style vectors can be input into one generated network model illustrated in the figure to obtain an image output by the network model, namely a fusion image, namely an image obtained after the original basic object image is automatically edited. As shown in fig. 7, in the fused image generated based on the fused set of style vectors, only the coarse-grained structural features of the design point region of the reference object are migrated to the corresponding position of the base object, and the fine-grained features such as color and texture are not migrated, so that the original design style of the base object is still retained in the fused image, and the local design details of the reference object are fused.

In summary, based on the solution provided by the embodiment of the present invention, when a user needs to perform editing processing of a new design on an image including a basic object by referring to a local design of the reference object, the solution provided by the embodiment of the present invention can be invoked to implement automatic image editing processing by only inputting the basic object image and the reference object image marked with a design point region, so as to obtain an editing result that not only retains an original design style of the basic object but also incorporates a local design detail of the reference object, thereby facilitating to improve the efficiency of the design processing.

The implementation of each step in the above embodiments is respectively described by way of example, and in order to fully understand the processing logic of the solution provided by the embodiments of the present invention, the overall implementation is illustrated in conjunction with fig. 8.

As shown in fig. 8, in practical applications, a user may load a basic object image and a reference object image through a terminal device such as a PC, a notebook computer, or the like, mark a design point region in the reference object image, and then invoke the image generation method provided by the embodiment of the present invention. The image generation method provided by the embodiment of the invention can be provided by an application program, so that a user can start the application program, load the two images on an image loading interface and then start the execution of the subsequent image generation process.

As described above with reference to fig. 8, the corresponding mask image may be generated based on the reference object image to which the design point region is marked, the first and second groups of style vectors may be extracted and fused, and the fused image as the editing result may be generated based on the fused style vectors.

As described above, the image generation method provided by the present invention can be executed in the cloud, and a plurality of computing nodes may be deployed in the cloud, and each computing node has processing resources such as computation and storage. In the cloud, a plurality of computing nodes may be organized to provide a service, and of course, one computing node may also provide one or more services. The way that the cloud provides the service may be to provide a service interface to the outside, and the user calls the service interface to use the corresponding service. The service Interface includes Software Development Kit (SDK), Application Programming Interface (API), and other forms.

According to the scheme provided by the embodiment of the invention, the cloud end can be provided with a service interface of the image generation service, and the user calls the image generation service interface through the user equipment so as to trigger a request for calling the image generation service interface to the cloud end. The cloud determines the compute nodes that respond to the request, and performs the following steps using processing resources in the compute nodes:

For the detailed process of the image generation service interface executing the image generation processing by using the processing resource, reference may be made to the related description in the foregoing other embodiments, which is not described herein again.

In practical application, the request can directly carry a reference object image containing a basic object image and a design point region marked with the reference object image, and the cloud analyzes the corresponding image from the request. And then, the cloud end generates a mask image and executes the subsequent steps. In addition, the cloud end can store models required to be used in the step execution process in advance so as to call the models to complete corresponding processing.

For ease of understanding, the description is illustrative with reference to FIG. 9. The user may invoke an image generation service interface through which a service request containing a base object image and a reference object image tagged with a design point area is uploaded by the user through a user device E1 illustrated in fig. 9. In the cloud, as shown in the figure, besides a plurality of computing nodes, a management node E2 running a management and control service is also deployed, after receiving a service request sent by user equipment E1, the management node E2 determines a computing node E3 responding to the service request, and after receiving the two images, the computing node E3 executes the steps of mask image generation, style vector extraction, fusion and the like, and finally outputs a fused image. The detailed implementation process refers to the description in the foregoing embodiments, and is not repeated herein. Thereafter, the computing node E3 sends the fused image to the user device E1, and the user device E1 displays the fused image, on the basis of which the user can perform further editing and the like.

Fig. 10 is a flowchart of an image generating method according to an embodiment of the present invention, and as shown in fig. 10, the method may include the following steps:

1001. and aligning the basic object image and the reference object image, and respectively acquiring a design point region and a non-design point region of the basic object image and the reference object image.

1002. And extracting a first group of style vectors corresponding to the non-design point region in the basic object image.

1003. And extracting a second group of style vectors corresponding to the design point region in the reference object image.

1004. And generating a fused image based on the fusion of the first group of style vectors and the second group of style vectors, wherein the fused image is combined with the structural characteristics of the design principal point region of the reference object image.

In this embodiment, reference may be made to the related description in the foregoing other embodiments for the processing procedure of aligning the basic object image and the reference object image, which is not described herein again.

In this embodiment, the process of respectively obtaining the design point region and the non-design point region of the basic object image and the reference object image may be implemented by using the mask image described in the foregoing embodiment, and the specific implementation process may refer to the relevant description in the foregoing embodiment, which is not described herein again. It is assumed here that an acquired image including only the non-design point region of the base object image is referred to as an image a, and an acquired image including only the design point region of the reference object image is referred to as an image B.

Then the first set of style vectors may be obtained by performing style vector extraction for image a and the second set of style vectors may be obtained by performing style vector extraction for image B.

In this embodiment, a set of style vectors corresponding to the non-design point region of the base object image is used as a set of style vectors corresponding to the base object image, and a set of style vectors corresponding to the design point region of the reference object image is used as a set of style vectors corresponding to the reference object image.

And fusing the two groups of style vectors, so that the fused group of style vectors not only contains the style of the non-design point region of the basic object image, but also contains the style of the design point region of the reference object image. The fused image generated based on the fused style vector contains the structural features of both the basic object and the design principal point region of the reference object image.

An image generation apparatus according to one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these means can each be constructed using commercially available hardware components and by performing the steps taught in this disclosure.

Fig. 11 is a schematic structural diagram of an image generating apparatus according to an embodiment of the present invention, and as shown in fig. 12, the apparatus includes: the device comprises an acquisition module 11, an extraction module 12 and a generation module 13.

The acquiring module 11 is configured to align a basic object image and a reference object image, and acquire a design point region of the reference object image and a non-design point region of the basic object image.

An extracting module 12, configured to extract a first group of style vectors corresponding to the basic object image; and extracting a second group of style vectors according to the design point region of the reference object image and the non-design point region of the basic object image.

And the generating module 13 is configured to generate a fused image based on the fusion of the first group of style vectors and the second group of style vectors, wherein the fused image is combined with the structural features of the design point region of the reference object image.

Optionally, in the process of acquiring the design point region of the reference object image and the non-design point region of the base object image, the acquiring module 11 is specifically configured to: acquiring a mask image of the reference object image, the mask image being generated based on the marked design point region of the reference object image; a first image containing a design point region of the reference object image and a second image containing a non-design point region of the base object image are acquired based on the mask image.

Optionally, the obtaining module 11 is specifically configured to: obtaining a first image containing the design point region of the reference object image by multiplying the mask image and the reference object image; obtaining a second image containing the non-design point region of the base object image by multiplying the exclusive-or result of the mask image with the base object image. Optionally, in the process of extracting the second group of style vectors, the extracting module 12 is specifically configured to: and (3) iteratively executing the following optimization process of the second group of style vectors until a second group of style vectors meeting the requirement is obtained: inputting the first image and the blended image into a classification model to determine a first loss function value by the classification model; the mixed image is an image generated based on a second group of style vectors to be optimized, and a loss function corresponding to the classification model is used for measuring the perception similarity of two input images; inputting the second image and the blended image into the classification model to determine a second loss function value by the classification model; combining the first loss function value and the second loss function value to optimize a second set of style vectors.

Optionally, in the optimization process of the second set of style vectors, the extraction module 12 is further configured to: comparing the pixels of the first image and the blended image to determine a third loss function value; and, performing a pixel comparison of the second image and the blended image to determine a fourth loss function value; optimizing a second set of style vectors in conjunction with the first, second, third, and fourth loss function values.

Optionally, in the process of fusing the first group of style vectors and the second group of style vectors, the generating module 13 is specifically configured to: determining fusion levels and fusion proportions of multi-layer style vectors contained in the first group of style vectors and the second group of style vectors respectively; and fusing the style vectors of the corresponding fusion levels according to the fusion proportion.

Optionally, in the process of aligning the base object image and the reference object image, the obtaining module 11 is configured to: respectively carrying out feature point detection on the basic object image and the reference object image; and aligning the basic object image and the reference object image according to the detected feature points.

In the feature point detection process, the obtaining module 11 is specifically configured to: respectively carrying out sparse feature point detection on the basic object image and the reference object image to obtain a plurality of feature points; respectively carrying out edge detection on the basic object image and the reference object image to obtain the edge of the basic object and the edge of the reference object; a plurality of feature points are sampled on the edges of the base object and the reference object, respectively.

In the image alignment process, the obtaining module 11 is specifically configured to: determining transformation parameters according to the corresponding relation between the characteristic points on the basic object image and the reference object image; transforming the reference object image according to the transformation parameters to align with the base object image.

The apparatus shown in fig. 11 may perform the steps described in the foregoing embodiments, and the detailed performing process and technical effects refer to the descriptions in the foregoing embodiments, which are not described herein again.

In one possible design, the structure of the image generating apparatus shown in fig. 11 may be implemented as an electronic device, as shown in fig. 12, which may include: input device 21, processor 22, display screen 23.

The input device 21 is coupled to the processor 22 and the display screen 23, and is used for inputting a basic object image and a reference object image.

The processor 22 is configured to align a basic object image and a reference object image, and acquire a design point region of the reference object image and a non-design point region of the basic object image; extracting a first group of style vectors corresponding to the basic object image; extracting a second group of style vectors according to the design point region of the reference object image and the non-design point region of the basic object image; and generating a fused image based on the fusion of the first group of style vectors and the second group of style vectors, wherein the fused image is combined with the structural characteristics of the design point region of the reference object image.

The display screen 23 is configured to display the basic object image, the reference object image, and the fusion image.

Fig. 13 is a schematic structural diagram of another electronic device provided in this embodiment, and as shown in fig. 13, the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or a portion of the method steps 101-105 described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The input/output interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G or 4G or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the electronic device 800 to perform the above-described method is also provided. For example, the non-transitory computer-readable storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, a magnetic or optical disk.

In addition, an embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which, when executed by a processor of an electronic device, causes the processor to implement at least the image generation method as provided in the foregoing embodiments.

The above-described apparatus embodiments are merely illustrative, wherein the units described as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An image generation method, comprising:

2. The method of claim 1, wherein said obtaining a design point region of the reference object image and a non-design point region of the base object image comprises:

acquiring a mask image of the reference object image, the mask image being generated based on the marked design point region of the reference object image;

a first image containing a design point region of the reference object image and a second image containing a non-design point region of the base object image are acquired based on the mask image.

3. The method according to claim 2, wherein said acquiring a first image containing a design point region of the reference object image and a second image containing a non-design point region of the base object image based on the mask image comprises:

obtaining a first image containing the design point region of the reference object image by multiplying the mask image and the reference object image;

obtaining a second image containing the non-design point region of the base object image by multiplying the exclusive-or result of the mask image with the base object image.

4. The method of claim 3, wherein said extracting a second set of style vectors from said design point region of said reference object image and said non-design point region of said base object image comprises:

inputting the first image and the blended image into a classification model to determine a first loss function value by the classification model; the mixed image is an image generated based on a second group of style vectors to be optimized, and a loss function corresponding to the classification model is used for measuring the perception similarity of two input images;

inputting the second image and the blended image into the classification model to determine a second loss function value by the classification model;

combining the first loss function value and the second loss function value to optimize a second set of style vectors.

5. The method of claim 4, wherein the optimization process for the second set of style vectors further comprises:

said combining said first loss function value and said second loss function value to optimize a second set of style vectors, comprising:

optimizing a second set of style vectors in conjunction with the first, second, third, and fourth loss function values.

6. The method of claim 1, wherein aligning the base object image and the reference object image comprises:

respectively carrying out feature point detection on the basic object image and the reference object image;

and aligning the basic object image and the reference object image according to the detected feature points.

7. The method according to claim 6, wherein the performing feature point detection on the base object image and the reference object image respectively comprises:

a plurality of feature points are sampled on the edges of the base object and the reference object, respectively.

8. The method of claim 7, wherein said aligning said base editing object image and said reference object image based on said detected feature points comprises:

determining transformation parameters according to the corresponding relation between the characteristic points on the basic object image and the reference object image;

transforming the reference object image according to the transformation parameters to align with the base object image.

9. The method of claim 1, wherein fusing the first set of style vectors and the second set of style vectors comprises:

determining fusion levels and fusion proportions of multi-layer style vectors contained in the first group of style vectors and the second group of style vectors respectively;

and fusing the style vectors of the corresponding fusion levels according to the fusion proportion.

10. An image generation method, comprising:

11. An electronic device, comprising: the device comprises an input device, a processor and a display screen;

12. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the image generation method of any one of claims 1 to 8.