WO2023093356A1

WO2023093356A1 - Image generation method and apparatus, and electronic device and storage medium

Info

Publication number: WO2023093356A1
Application number: PCT/CN2022/125425
Authority: WO
Inventors: 顾天培; 林纯泽; 王权; 钱晨
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-11-26
Filing date: 2022-10-14
Publication date: 2023-06-01
Also published as: CN113837934B; CN113837934A

Abstract

The present disclosure relates to an image generation method and apparatus, and an electronic device and a storage medium. According to the image generation method and apparatus, and the electronic device and the storage medium, the method comprises: acquiring a real image generation network and a target stylized image generation network; respectively inputting the same piece of random data into the target stylized image generation network and the real image generation network, so as to obtain a target stylized image output by the target stylized image generation network, and a target real image output by the real image generation network; and determining the target stylized image and the target real image, which correspond to the same piece of random data, as a pair of paired images.

Description

Image generation method and device, electronic device and storage medium

This disclosure claims priority to a Chinese patent application filed with the China Patent Office on November 26, 2021 with application number 202111417366.5 and titled "Image generation method and device, electronic device, and storage medium," the entire contents of which are incorporated by reference in this disclosure.

technical field

The present disclosure relates to the field of computer technology, and in particular to an image generation method and device, electronic equipment and storage media.

Background technique

Image stylization is to convert the original image into a stylized image of a specific style, such as sketch portrait style, cartoon image style, oil painting style, etc.

Contents of the invention

The disclosure proposes a technical solution for image generation.

According to an aspect of the present disclosure, an image generation method is provided, including: obtaining a realistic image generation network and a target stylized image generation network; inputting the same random data into the target stylized image generation network and the real The stylized image generation network is used to obtain the target stylized image output by the target stylized image generation network, and the target realistic image output by the realistic image generation network, the target stylized image has a target style, wherein the The target stylized image generation network is obtained by fusing the realistic image generation network with the original stylized image generation network, and the original stylized image generation network is used to generate images with the target style; the same random The target stylized image and the target realized image corresponding to the data are determined as a pair of paired images. In this way, a large number of paired images can be generated by using random data, which not only reduces the construction cost of the paired images, but also allows the target stylized image in the constructed paired images to have both sufficient realistic details and sufficient stylized effects.

In a possible implementation manner, the realistic image generation network and the original stylized image generation network each have N layers of network layers, and N is a positive integer, wherein the combination of the realistic image generation network and The fusion of the original stylized image generation network includes: replacing the first I-layer network layer of the original stylized image generation network with the first I-layer network layer of the realistic image generation network to obtain the target style stylized image generation network, I∈[1,N); wherein, the value of I is negatively correlated with the stylization degree of the target stylized image generated by the target stylized image generation network. In this way, the target stylized image generated by the target stylized image generation network can have sufficient realistic details and sufficient stylized effect.

In a possible implementation manner, the first I-layer network layer is used to learn low-resolution information of images, and the low-resolution information includes edge contour information and style information of images; wherein, the original stylized The first I-layer network layer of the image generation network is replaced by the first I-layer network layer of the realized image generation network, including: the low-resolution information learned by the first I-layer network layer of the original stylized image generation network, exchange with the low-resolution information learned by the first layer I network layer of the realistic image generation network. In this way, the target stylized image generation network can take into account the low-resolution information learned by the realistic image generation network and the high-resolution information learned by the original stylized image generation network, and then can generate images with sufficient realistic details and sufficient style. The target stylized image for the Stylize effect.

In a possible implementation manner, the realistic image generation network and the original stylized image generation network each have N layers of network layers, and N is a positive integer, wherein the combination of the realistic image generation network and The fusion of the original stylized image generation network also includes: replacing the last J-layer network layer of the original stylized image generation network with the last J-layer network layer of the realized image generation network to obtain the target A stylized image generation network, J∈[1,N); wherein, the value of J is negatively correlated with the stylization degree of the target stylized image generated by the target stylized image generation network. In this way, the target stylized image generated by the target stylized image generation network can have sufficient realistic details and sufficient stylized effect.

In a possible implementation, the post-J layer network layer is used to learn the high-resolution information of the image, and the high-resolution information includes the detailed information of the image; wherein, the post-J layer of the original stylized image generation network Layer network layer, replaced by the back J-layer network layer of the realized image generation network, including: combining the high-resolution information learned by the back J-layer network layer of the original stylized image generation network with the realized image The high-resolution information learned by the post-J network layers of the generative network is exchanged. In this way, the target stylized image generation network can take into account the low-resolution information learned by the stylized image generation network and the high-resolution information learned by the realistic image generation network, and then can generate images with sufficient realistic details and sufficient stylization. The effect's target stylized image.

In a possible implementation manner, the original stylized image generation network is obtained by performing transfer learning on the realistic image generation network based on a stylized sample image, and the stylized sample image has the target style. In this way, the network structure of the realized image generation network and the original stylized image generation network can be the same.

In a possible implementation manner, the performing transfer learning on the realistic image generation network based on the stylized sample image includes: acquiring the realistic image generation network and the stylized sample image with the target style ; Using the stylized sample image, perform migration learning on the realistic image generation network to obtain the original stylized image generation network. In this way, the network structure of the realized image generation network and the original stylized image generation network can be the same.

In a possible implementation manner, the realistic image generation network is obtained by performing network training on a resolution-increasing image generation confrontation network model, and the realistic image generation network has N layers of network layers, each An n-layer network layer represents a resolution level, and the realistic image generation network is used to generate images of different resolutions by resolution level, N is a positive integer, n∈[1,N).

In a possible implementation manner, the paired images are multiple pairs, and the multiple pairs of paired images are used to train the initial network to obtain a target stylized network, and the target stylized network is used to convert the input image into a Describe the image of the target style. In this way, the paired images can be used to effectively train a target stylization network that can convert an input image into an image with the target style.

According to an aspect of the present disclosure, an image generation device is provided, including: an acquisition module, used to obtain a realistic image generation network and a target stylized image generation network; an output module, used to input the same random data into the The target stylized image generation network and the realistic image generation network obtain the target stylized image output by the target stylized image generation network, and the target realistic image output by the realistic image generation network, and the target style The stylized image has the target style, wherein the target stylized image generation network is obtained by fusing the realistic image generation network with the stylized image generation network, and the original stylized image generation network is used to generate the An image of the target style; a determining module, configured to determine the target stylized image and the target realized image corresponding to the same random data as a pair of paired images.

According to an aspect of the present disclosure, there is provided an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to call the instructions stored in the memory to execute the above-mentioned method.

According to one aspect of the present disclosure, there is provided a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above method is implemented.

According to one aspect of the present disclosure, a computer program is provided, including computer readable codes, and when the computer readable codes are run in an electronic device, a processor in the electronic device executes the above method.

In the embodiment of the present disclosure, by fusing the realistic image generation network with the original stylized image generation network to obtain the target stylized generation network, and then using random data, a large number of paired images can be generated, which not only reduces the construction of paired images cost, and the target stylized image in the constructed paired image can have enough realistic details and enough stylized effect; in addition, when the paired image is applied to the network model training, the trained image can be obtained based on the paired image The target stylization network, the obtained target stylization network can transform the realistic image into an image with sufficient realistic details and sufficient stylized effect.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.

Description of drawings

The accompanying drawings here are incorporated into the description and constitute a part of the present description. These drawings show embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure.

FIG. 1 shows a flowchart of an image generation method according to an embodiment of the present disclosure.

Fig. 2 shows a schematic diagram of network convergence according to an embodiment of the present disclosure.

FIG. 3 shows a schematic diagram of a degree of stylization according to an embodiment of the disclosure.

FIG. 4 shows a schematic diagram of a degree of stylization according to an embodiment of the disclosure.

FIG. 5 shows a block diagram of an image generating device according to an embodiment of the present disclosure.

Fig. 6 shows a block diagram of an electronic device according to an embodiment of the present disclosure.

Detailed ways

Various exemplary embodiments, features, and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures indicate functionally identical or similar elements. While various aspects of the embodiments are shown in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.

The term "and/or" in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three situations. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.

In addition, in order to better illustrate the present disclosure, numerous specific details are given in the following specific implementation manners. It will be understood by those skilled in the art that the present disclosure may be practiced without some of the specific details. In some instances, methods, means, components and circuits that are well known to those skilled in the art have not been described in detail so as to obscure the gist of the present disclosure.

Fig. 1 shows a flow chart of an image generation method according to an embodiment of the present disclosure, the image generation method may be executed by an electronic device such as a terminal device or a server, and the terminal device may be a user equipment (User Equipment, UE), a mobile device, a user Terminal, terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, computing device, vehicle-mounted device, wearable device, etc., the method can call the computer-readable information stored in the memory through the processor instructions, or the method may be executed by a server. As shown in Figure 1, the image generation method includes:

In step S11, a realistic image generation network and a target stylized image generation network are acquired.

Among them, the realistic image generation network is used to generate realistic images, and the target stylized image generation network is used to generate target stylized images with the target style. Realized images can be understood as images without the target style, and target stylized images can be understood as images with the target style. In a possible implementation, the target style can include any image style such as sketch portrait style, cartoon image style, oil painting style, comic style, etc., wherein, the comic style can at least include, for example: SD doll, becoming a child, CG style 1, Impasto, dark Korean comics, Korean comics, CG style 2. It should be understood that the embodiment of the present disclosure does not limit the type of the target style.

In a possible implementation, the realistic image generation network can be obtained by performing network training on the image generation adversarial network model with increasing resolution, wherein the image generation adversarial network model with increasing resolution can, for example, be Including ProgressiveGAN, StyleGAN, StyleGANv1, StyleGANv2, StyleGANv3 and other derived network models of StyleGAN.

It can be known that this kind of image generation confrontation network model with increasing resolution includes generation network G and discriminant network D, and the basic principle of image generation confrontation network model with increasing resolution can be simply understood as: A random data z (also called random noise) is input into the generation network G, the generation network G generates an image G(z) through this random data, and the generated image G(z) is input into the discriminant network D, through the discriminant network D to judge whether the input image is real, or whether it is the image G(z) generated by the generation network G.

Among them, increasing by resolution can be understood as that the shallow network layer of the generation network G first learns and generates low-resolution images (such as 4×4 resolution), and then gradually continues to learn and generate images as the network depth increases. Generate higher resolution images (such as 1024×1024 resolution).

In a possible implementation, the realistic image generation network has N layers of network layers, and each n-layer network layer represents a resolution level, and the realistic image generation network is used to generate images of different resolutions by resolution level, N is a positive integer, n∈[1,N); for example, StyleGAN with 18 network layers can be used, where every two layers can represent a resolution level, and different resolution levels are generated from 4×4 to 1024×1024 rate images. In this way, when merging the realistic image generation network and the original stylized image generation network, it is equivalent to exchanging the low-resolution information learned by the two networks with the high-resolution information. The resolution information includes image edge profile information and style information, and the high-resolution information includes image detail information.

In the above-mentioned training process of the image generation confrontation network model with increasing resolution, the goal of the generation network G is to generate real images to deceive the discriminant network D, and the goal of the discriminant network D is to identify the images generated by the generative network G as much as possible. image; wherein, the confrontation loss between the generation network G and the discriminant network D can be used to train this type of image generation confrontation network model, and the trained generation network D can be used as a realistic image generation network. It should be understood that those skilled in the art may use network training methods known in the art to train this type of resolution-increasing image generative adversarial network model, which is not limited by the embodiments of the present disclosure.

In a possible implementation, the target stylized image generation network is obtained by fusing the realistic image generation network with the original stylized image generation network, and the original stylized image generation network is used to generate images with the target style; the original The stylized image generation network is obtained by transferring the realistic image generation network based on the stylized sample images, which have the target style.

As mentioned above, the original stylized image generation network can be obtained by transfer learning the realized image generation network, then the network structure of the realized image generation network and the original stylized image generation network can be the same, in a possible In the implementation method, the realistic image generation network and the original stylized image generation network can be interchanged and fused at a specific network layer, so as to obtain the target stylized image generation network, for example, some layers of the realistic image generation network layer, replaced by the network layer of the corresponding layer in the original stylized image generation network.

In this way, the target stylized image generated by the target stylized image generation network can have sufficient realistic details and sufficient stylized effects.

In step S12, the same random data is respectively input into the target stylized image generation network and the realized image generation network to obtain the target stylized image output by the target stylized image generation network and the target realization output by the realized image generation network Image, target stylized The image has the target style.

As mentioned above, a random data z (also called random noise) is input into the generation network G, and the generation network G generates an image G(z) through this random data; that is, the real image generation network and the target style The optimized image generation network actually uses random data to perform step-by-step upsampling or increase the resolution step by step to generate images. To generate paired object realization images and object stylized images.

In this way, paired images can be efficiently generated, and the target realized image and the target stylized image are correspondingly matched, or in other words, the target stylized image is equivalent to a stylized target realized image.

In a possible implementation, the random data may include: any type of value such as random vectors, feature matrices, random tensors, etc., where the random vectors may be hidden vectors subject to a Gaussian distribution, which is not limited by this embodiment of the present disclosure .

In step S13, the target stylized image and the target realized image corresponding to the same random data are determined as a pair of paired images.

There may be multiple random data, and multiple pairs of paired images may be generated. It is understandable that the target realization image and the target stylized image in each pair of paired images are generated based on the same random data.

In a possible implementation, the paired images can be used to train an initial network to obtain a trained network capable of transforming an input image into an image with a target style. Wherein, the initial network can adopt a deep learning network model, for example, a convolutional neural network, an adversarial neural network and other network models can be used. It should be understood that the embodiments of the present disclosure do not limit the network structure, network type, training method, etc. of the initial network.

In the embodiment of the present disclosure, by fusing the realistic image generation network with the original stylized image generation network to obtain the target stylized generation network, the target stylized image generated by the target stylized generation network can have sufficient Realistic details and sufficient stylized effects; thus, using the realistic image generation network and the target stylized image generation network, high-quality paired images can be efficiently generated.

As mentioned above, the realistic image generation network and the original stylized image generation network can be exchanged and fused at a specific network layer to obtain the target stylized image generation network. In a possible implementation, the realistic image generation network and the original stylized image generation network each have N layers of network layers, and N is a positive integer, wherein the realistic image generation network and the original stylized image generation network are fused , which can include:

Replace the first I-layer network layer of the original stylized image generation network with the first I-layer network layer of the realistic image generation network to obtain the target stylized image generation network, I∈[1,N).

Among them, the first I-layer network layer of the original stylized image generation network is replaced with the first I-layer network layer of the realized image generation network, that is, the first I-layer network layer of the realized image generation network is compared with the original stylized The last N-I layer network layer of the image generation network is spliced.

For example, FIG. 2 shows a schematic diagram of network fusion according to an embodiment of the present disclosure. As shown in FIG. 2 , the first N/2 layers of the network layer of the realistic image generation network can be combined with the original stylized image generation network Splicing the last N/2 network layers of the original stylized image generation network, that is, replacing the first N/2 layer network layer of the original stylized image generation network with the first N/2 layer network layer of the real image generation network, to obtain the target stylized image generation network.

Among them, the resolution of each network output in the N-layer network layer increases layer by layer, that is, the first few network layers of the N-layer network layer can be considered as low-resolution layers, and the next few network layers can be considered as high-resolution layers. resolution level. It can be understood from this that the target stylized image generation network obtained through the embodiments of the present disclosure can be simply understood as, in the process of generating the target stylized image, firstly generate realistic intermediate images step by step based on random data, and then step by step The second stage adds a stylized effect on the realized intermediate image to obtain the target stylized image.

As mentioned above, when performing network fusion on the realistic image generation network and the original stylized image generation network, it is equivalent to exchanging the low-resolution information and high-resolution information learned by the two networks respectively. The resolution information includes image edge profile information and style information, and the high-resolution information includes image detail information. That is, the first I-layer network layer of the two networks is used to learn low-resolution information, and the latter N-1-layer network layer is used to learn high-resolution information.

In a possible implementation, the first I-layer network layer of the original stylized image generation network is replaced with the first I-layer network layer of the realized image generation network, including: the first I-layer network layer of the original stylized image generation network The low-resolution information learned by the network layer is exchanged with the low-resolution information learned by the previous I-layer network layer of the realistic image generation network. In this way, the target stylized image generation network can take into account the low-resolution information learned by the realistic image generation network and the high-resolution information learned by the original stylized image generation network, and then can generate images with sufficient realistic details and sufficient style. The target stylized image for the Stylize effect.

In a possible implementation, the value of I can be set according to the requirements of the degree of stylization, wherein the value of I is negatively correlated with the degree of stylization of the target stylized image generated by the target stylized image generation network, which can be understood Therefore, the smaller the value of I, the closer the target stylized image generated by the target stylized image generation network to the stylized effect (or the less realistic image), that is, the higher the degree of stylization; the larger the value of I, The closer the target stylized image is to the realistic effect (or more like the realistic image), that is, the lower the stylization degree. Fig. 3 shows a schematic diagram of a degree of stylization according to an embodiment of the present disclosure. As shown in Fig. 3 , the larger the value of I, the closer the degree of stylization is to the realistic effect, that is, the more it looks like a real human face, the value of I The smaller it is, the closer the stylization program is to the stylization effect, that is, the less it resembles a real face.

In the embodiment of the present disclosure, the target stylized image generated by the target stylized image generation network can have sufficient realistic details and sufficient stylized effect.

In a possible implementation, the realistic image generation network and the original stylized image generation network each have N layers of network layers, and N is a positive integer, wherein the realistic image generation network and the original stylized image generation network are fused , which can also include:

Replace the last J-layer network layer of the original stylized image generation network with the last J-layer network layer of the realistic image generation network to obtain the target stylized image generation network, J∈[1,N).

Among them, the post-J layer network layer of the original stylized image generation network is replaced by the post-J-layer network layer of the realized image generation network, that is, the post-J-layer network layer of the realized image generation network is compared with the original stylized The first N-J layer network layers of the image generation network are spliced.

As mentioned above, the resolution of each network output in the N-layer network layer increases layer by layer, that is, the first few network layers of the N-layer network layer can be considered as low-resolution layers, and the next few network layers can be considered as low-resolution layers. is a high-resolution layer. It can be understood from this that the target stylized image generation network obtained through the embodiments of the present disclosure can be simply understood as, in the process of generating the target stylized image, the stylized intermediate image is firstly generated step by step based on random data, and then step by step The second stage adds realistic details to the stylized intermediate image to obtain the target stylized image.

As mentioned above, when performing network fusion on the realistic image generation network and the original stylized image generation network, it is equivalent to exchanging the low-resolution information and high-resolution information learned by the two networks respectively. The resolution information includes image edge profile information and style information, and the high-resolution information includes image detail information. That is, the first N-J layer network layers of the two networks are used to learn low-resolution information, and the last J-layer network layers are used to learn high-resolution information.

In a possible implementation, the post-J layer network layer of the original stylized image generation network is replaced by the post-J layer network layer of the realized image generation network, including: the post-J layer network layer of the original stylized image generation network The high-resolution information learned by the network layer is exchanged with the high-resolution information learned by the subsequent J-layer network layer of the realistic image generation network. In this way, the target stylized image generation network can take into account the low-resolution information learned by the stylized image generation network and the high-resolution information learned by the realistic image generation network, and then can generate images with sufficient realistic details and sufficient stylization. The effect's target stylized image.

In a possible implementation, the value of J can be set according to specific stylization requirements, wherein the value of J is negatively correlated with the stylization degree of the target stylized image generated by the target stylized image generation network, which can be understood as, The larger the value of J, the closer the target stylized image generated by the target stylized image generation network to the realistic effect (or the more like a real image), the lower the degree of stylization, the smaller the value of J, the closer the target stylized image The closer to the stylized effect (or the less realistic the image is), the higher the stylization. Fig. 4 shows a schematic diagram of the degree of stylization according to an embodiment of the present disclosure. As shown in Fig. 4, the smaller the value of J, the higher the degree of stylization, that is, the less it resembles a real human face; the larger the value of J, the higher the degree of stylization. The lower the degree of humanization, the more it resembles a real human face.

As mentioned above, the original stylized image generation network is obtained by transferring the realistic image generation network based on the stylized sample images. study, including:

Obtain a realistic image generation network and a stylized sample image with the target style; use the stylized sample image to perform transfer learning on the realistic image generation network to obtain the original stylized image generation network.

Wherein, the realistic image generation network may be a generation network D trained according to the above-mentioned network training process. Migration learning can be understood as enabling the realistic image generation network to learn the target style in the stylized sample image, thereby generating a stylized image with the target style, that is, to obtain the original stylized image generation network.

It should be understood that those skilled in the art can use transfer learning techniques known in the art to implement transfer learning on the realistic image generation network using stylized sample images to obtain the original stylized image generation network. Examples are not limited.

In a possible implementation, the original stylized image generation network can be obtained by training the above-mentioned resolution-increasing image generation adversarial network model by referring to the training method of the above-mentioned realistic image generation network; For the sample image, transfer learning is performed on the original stylized image generation network to obtain a realistic image generation network, which is not limited in this embodiment of the present disclosure.

In the embodiment of the present disclosure, the original stylized image generation network can be efficiently obtained, and the original stylized image generation network can maintain the network structure of the realistic image generation network without increasing the parameter amount of the original stylized image generation network, which is convenient Afterwards, the realistic image generation network is fused with the original stylized image generation network.

As mentioned above, the same random data can be input into the target stylized image generation network and the realized image generation network respectively to obtain the target stylized image and the target realized image. There can be multiple random data, that is, there can be multiple pairs of paired images. In a possible implementation, multiple pairs of paired images can be used to train the initial network to obtain the target stylized network, and the target stylized network is used to convert the input The original image is transformed into an image with the style of the target.

As mentioned above, the initial network can adopt a deep learning network model known in the art, for example, a convolutional neural network, an adversarial neural network and other network models can be used. It should be understood that the embodiment of the present disclosure does not limit the network structure and network type of the initial network.

In a possible implementation manner, the training process of using the paired images to train the initial network may include, for example: inputting the target realization image in the paired images into the initial network to obtain the predicted stylized image output by the initial network; The loss between the stylized image and the target stylized image in the paired image, through gradient descent, back propagation, etc., optimize the network parameters of the initial network until the loss converges, and obtain the target stylized network.

Wherein, the loss between the predicted stylized image and the target stylized image can be determined according to the distance between the predicted stylized image and the target stylized image, wherein the distance can include: the distance between the predicted stylized image and the target stylized image The L1 distance or L2 distance between them, and through the specified loss function (such as L1 loss function, L2 loss function), determine the loss between the predicted stylized image and the target stylized image.

It should be understood that the above-mentioned training process of using paired images to train the initial network is an implementation method provided by the embodiments of the present disclosure. In fact, those skilled in the art can use any network training method known in the art to realize using The image is used to train the initial network, and the trained target stylized network is obtained.

In a possible implementation, after obtaining the trained target stylized network, the target stylized network can be applied to short video applications, photography applications, game applications, social applications, and comics of various styles In the face generation tool, the actual collected face image can be converted into a stylized face image with the target style by using the target stylized image.

In the embodiments of the present disclosure, the paired images can be used to effectively train a target stylization network capable of converting an input image into an image with the target style.

According to the image generation method in the embodiment of the present disclosure, the user can only provide a small number of stylized sample images to obtain the original stylized image generation network and the target stylized image generation network; and then use random data to generate a large number of paired images , not only reduces the construction cost of the paired image, but also the target stylized image in the constructed paired image can have enough realistic details and enough stylized effect. In addition, when the paired image is applied to the network model training, A trained target stylization network can be obtained based on the paired images, and the obtained target stylization network can transform the realistic image into an image with sufficient realistic details and sufficient stylized effect.

It can be understood that the above-mentioned method embodiments mentioned in this disclosure can all be combined with each other to form a combined embodiment without violating the principle and logic. Due to space limitations, this disclosure will not repeat them. Those skilled in the art can understand that, in the above method in the specific implementation manner, the specific execution order of each step should be determined according to its function and possible internal logic.

In addition, the present disclosure also provides image generating devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any image generating method provided in the present disclosure, corresponding technical solutions and descriptions, and corresponding records in the method section ,No longer.

Fig. 5 shows a block diagram of an image generation device according to an embodiment of the present disclosure. As shown in Fig. 5, the device includes:

An acquisition module 101, configured to acquire a realistic image generation network and a target stylized image generation network;

The output module 102 is configured to input the same random data into the target stylized image generation network and the realized image generation network respectively, to obtain the target stylized image output by the target stylized image generation network, and the real The target realized image output by the stylized image generation network, the target stylized image has the target style, wherein the target stylized image generation network is obtained by fusing the realized image generation network and the original stylized image generation network , the original stylized image generation network is used to generate an image with the target style;

The determining module 103 is configured to determine the target stylized image and the target realized image corresponding to the same random data as a pair of paired images.

In a possible implementation manner, the realistic image generation network and the original stylized image generation network each have N layers of network layers, and N is a positive integer, wherein the combination of the realistic image generation network and The fusion of the original stylized image generation network includes: replacing the first I-layer network layer of the original stylized image generation network with the first I-layer network layer of the realistic image generation network to obtain the target style stylized image generation network, I∈[1,N); wherein, the value of I is negatively correlated with the stylization degree of the target stylized image generated by the target stylized image generation network.

In a possible implementation manner, the first I-layer network layer is used to learn low-resolution information of images, and the low-resolution information includes edge contour information and style information of images; wherein, the original stylized The first I-layer network layer of the image generation network is replaced by the first I-layer network layer of the realized image generation network, comprising: combining the low-resolution information learned by the first I-layer network layer of the original stylized image generation network with The low-resolution information learned by the first layer I network layer of the realistic image generation network is exchanged.

In a possible implementation manner, the realistic image generation network and the original stylized image generation network each have N layers of network layers, and N is a positive integer, wherein the combination of the realistic image generation network and The fusion of the original stylized image generation network also includes: replacing the last J-layer network layer of the original stylized image generation network with the last J-layer network layer of the realized image generation network to obtain the target The original stylized image generation network, J∈[1,N); wherein, the value of J is negatively correlated with the stylization degree of the target stylized image generated by the target original stylized image generation network.

In a possible implementation, the post-J layer network layer is used to learn the high-resolution information of the image, and the high-resolution information includes the detailed information of the image; wherein, the post-J layer of the original stylized image generation network Layer network layer, replaced by the last J-layer network layer of the realized image generation network, including: combining the high-resolution information learned by the rear J-layer network layer of the original stylized image generation network with the realized image generation The high-resolution information learned by the post-J network layers of the network is exchanged.

In a possible implementation manner, the original stylized image generation network is obtained by performing transfer learning on the realistic image generation network based on a stylized sample image, and the stylized sample image has the target style.

In a possible implementation manner, the performing transfer learning on the realistic image generation network based on the stylized sample image includes: acquiring the realistic image generation network and the stylized sample image with the target style ; Using the stylized sample image, perform migration learning on the realistic image generation network to obtain the original stylized image generation network.

In a possible implementation manner, the paired images are multiple pairs, and the multiple pairs of paired images are used to train the initial network to obtain a target stylized network, and the target stylized network is used to convert the input image into a Describe the image of the target style.

In some embodiments, the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the methods described in the method embodiments above, and its specific implementation can refer to the description of the method embodiments above. For brevity, here No longer.

Embodiments of the present disclosure also provide a computer-readable storage medium, on which computer program instructions are stored, and the above-mentioned method is implemented when the computer program instructions are executed by a processor. Computer readable storage media may be volatile or nonvolatile computer readable storage media.

An embodiment of the present disclosure also proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.

An embodiment of the present disclosure also provides a computer program, including computer readable codes, and when the computer readable codes are run in an electronic device, a processor in the electronic device executes the above method. An embodiment of the present disclosure also provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.

Electronic devices may be provided as terminals, servers, or other forms of devices.

This disclosure relates to the field of augmented reality. By acquiring the image information of the target object in the real environment, and then using various visual correlation algorithms to detect or identify the relevant features, states and attributes of the target object, and thus obtain the image information that matches the specific application. AR effect combining virtual and reality. Exemplarily, the target object may involve faces, limbs, gestures, actions, etc. related to the human body, or markers and markers related to objects, or sand tables, display areas or display items related to venues or places. Vision-related algorithms may involve visual positioning, SLAM, 3D reconstruction, image registration, background segmentation, object key point extraction and tracking, object pose or depth detection, etc. Specific applications can not only involve interactive scenes such as guided tours, navigation, explanations, reconstructions, virtual effect overlays and display related to real scenes or objects, but also special effects processing related to people, such as makeup beautification, body beautification, special effect display, virtual Interactive scenarios such as network display. The relevant features, states and attributes of the target object can be detected or identified through the convolutional neural network. The above-mentioned convolutional neural network is a network obtained by performing network training based on a deep learning framework.

FIG. 6 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure. For example, the electronic device 1900 may be provided as a server or terminal. Referring to FIG. 6 , electronic device 1900 includes processing component 1922 , which further includes one or more processors, and a memory resource represented by memory 1932 for storing instructions executable by processing component 1922 , such as application programs. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 1922 is configured to execute instructions to perform the above method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900 , a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input and output interface 1958 . The electronic device 1900 can operate based on the operating system stored in the memory 1932, such as the Microsoft server operating system (Windows Server ^TM ), the graphical user interface-based operating system (Mac OS X ^TM ) introduced by Apple Inc., and the multi-user and multi-process computer operating system (Unix ^™ ), a free and open source Unix-like operating system (Linux ^™ ), an open source Unix-like operating system (FreeBSD ^™ ), or the like.

In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium, such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to implement the above method.

The present disclosure can be a system, method and/or computer program product. A computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present disclosure.

A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.

Computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

Computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect). In some embodiments, an electronic circuit, such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA), can be customized by utilizing state information of computer-readable program instructions, which can Various aspects of the present disclosure are implemented by executing computer readable program instructions.

Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.

It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.

The computer program product can be specifically realized by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.

Having described various embodiments of the present disclosure above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein.

Claims

A method for generating an image, comprising:

Obtain a realistic image generation network and a target stylized image generation network;

Inputting the same random data into the target stylized image generation network and the realized image generation network respectively, to obtain the target stylized image output by the target stylized image generation network and the target stylized image output by the realized image generation network A target realized image, the target stylized image has a target style, wherein the target stylized image generation network is obtained by fusing the realized image generation network with an original stylized image generation network, and the original style A simplified image generation network is used to generate images with the target style;

The target stylized image and the target realized image corresponding to the same random data are determined as a pair of paired images.
The method according to claim 1, wherein the realized image generation network and the original stylized image generation network each have N layers of network layers, and N is a positive integer, wherein the realized The image generation network is fused with the original stylized image generation network, including:

The first I-layer network layer of the original stylized image generation network is replaced by the first I-layer network layer of the realized image generation network, to obtain the target stylized image generation network, I∈[1,N); Wherein, the value of I is negatively correlated with the stylization degree of the target stylized image generated by the target stylized image generation network.
The method according to claim 2, characterized in that, the first I-layer network layers of the realized image generation network and the original stylized image generation network are used to learn low-resolution information of images, and the low-resolution information includes Image edge profile information and style information;

Wherein, the first I-layer network layer of the original stylized image generation network is replaced by the first I-layer network layer of the realized image generation network, including:

exchanging the low-resolution information learned by the first I-layer network layer of the original stylized image generation network with the low-resolution information learned by the first I-layer network layer of the realized image generation network.
The method according to claim 1, wherein the realized image generation network and the original stylized image generation network each have N layers of network layers, and N is a positive integer, wherein the realized The image generation network is fused with the original stylized image generation network, and also includes:

The post J-layer network layer of the original stylized image generation network is replaced by the post J-layer network layer of the realized image generation network to obtain the target stylized image generation network, J∈[1,N); Wherein, the value of J is negatively correlated with the stylization degree of the target stylized image generated by the target stylized image generation network.
The method according to claim 4, characterized in that, the rear J-layer network layer of the realized image generation network and the original stylized image generation network is used to learn high-resolution information of images, and the high-resolution information includes details of the image;

Wherein, the post-J layer network layer of the original stylized image generation network is replaced by the post-J layer network layer of the realized image generation network, including:

exchanging the high resolution information learned by the last J network layers of the original stylized image generation network with the high resolution information learned by the last J network layers of the realized image generation network.
The method according to any one of claims 1 to 5, wherein the original stylized image generation network is obtained by performing transfer learning on the realistic image generation network based on stylized sample images, and the stylized sample images The image has the stated target style.
The method according to claim 6, wherein the transfer learning of the realistic image generation network based on the stylized sample image comprises:

Obtaining the realistic image generation network and a stylized sample image with the target style;

Using the stylized sample image, transfer learning is performed on the realistic image generation network to obtain the original stylized image generation network.
The method according to any one of claims 1 to 5, wherein the realistic image generation network is obtained by performing network training on an image generation confrontation network model with increasing resolution, and the realistic image generation network The network has N layers of network layers, each n layer of network layers represents a resolution level, and the realistic image generation network is used to generate images of different resolutions by resolution level, N is a positive integer, n∈[1,N) .
The method according to any one of claims 1 to 5, wherein the paired images are multiple pairs, and multiple pairs of the paired images are used to train the initial network to obtain a target stylized network, and the target stylized network is used for Transform the input image into an image with the described target style.
An image generating device, characterized in that it comprises:

An acquisition module, configured to acquire a realistic image generation network and a target stylized image generation network;

An output module, configured to input the same random data into the target stylized image generation network and the realized image generation network respectively, to obtain the target stylized image output by the target stylized image generation network, and the realized The target realized image output by the image generation network, the target stylized image has a target style, wherein the target stylized image generation network is obtained by fusing the realized image generation network with the original stylized image generation network , the original stylized image generation network is used to generate an image with the target style;

A determining module, configured to determine the target stylized image and the target realized image corresponding to the same random data as a pair of paired images.
The device according to claim 10, wherein the realized image generation network and the original stylized image generation network each have N layers of network layers, and N is a positive integer, wherein the realized The image generation network is fused with the original stylized image generation network, including:

The first I-layer network layer of the original stylized image generation network is replaced by the first I-layer network layer of the realized image generation network, to obtain the target stylized image generation network, I∈[1,N); Wherein, the value of I is negatively correlated with the stylization degree of the target stylized image generated by the target stylized image generation network.
The device according to claim 11, characterized in that, the first layer I network layers of the realistic image generation network and the original stylized image generation network are used to learn low-resolution information of images, and the low-resolution information includes Image edge profile information and style information;

Wherein, the first I-layer network layer of the original stylized image generation network is replaced by the first I-layer network layer of the realized image generation network in the following manner:

exchanging the low-resolution information learned by the first I-layer network layer of the original stylized image generation network with the low-resolution information learned by the first I-layer network layer of the realized image generation network.
The device according to claim 10, wherein the realized image generation network and the original stylized image generation network each have N layers of network layers, and N is a positive integer, wherein the realized The image generation network is fused with the original stylized image generation network, and also includes:

The post J-layer network layer of the original stylized image generation network is replaced by the post J-layer network layer of the realized image generation network to obtain the target stylized image generation network, J∈[1,N); Wherein, the value of J is negatively correlated with the stylization degree of the target stylized image generated by the target stylized image generation network.
The device according to claim 13, characterized in that, the last J-layer network layers of the realized image generation network and the original stylized image generation network are used to learn high-resolution information of images, and the high-resolution information includes details of the image;

Wherein, the post-J layer network layer of the original stylized image generation network is replaced by the post-J layer network layer of the realized image generation network, including:

exchanging the high-resolution information learned by the post-J network layers of the original stylized image generation network with the high-resolution information learned by the post-J network layers of the realized image generation network.
The device according to any one of claims 10 to 14, wherein the original stylized image generation network is obtained by performing transfer learning on the realistic image generation network based on stylized sample images, and the stylized The sample image has the target style.
The device according to claim 15, wherein the transfer learning of the realistic image generation network based on the stylized sample image comprises:

Obtaining the realistic image generation network and a stylized sample image with the target style;

Using the stylized sample image, transfer learning is performed on the realistic image generation network to obtain the original stylized image generation network.
The device according to any one of claims 10 to 14, wherein the realistic image generation network is obtained by performing network training on an image generation confrontation network model that increases resolution by resolution, and the realistic image The generation network has N layers of network layers, and each n-layer network layer represents a resolution level, and the realistic image generation network is used to generate images of different resolutions by resolution level, N is a positive integer, n∈[1,N ).
The device according to any one of claims 10 to 14, wherein the paired images are multiple pairs, and the multiple pairs of paired images are used to train the initial network to obtain the target stylized network, and the target stylized network uses to transform the input image into an image with the stated target style.
An electronic device, characterized in that it comprises:

processor;

memory for storing processor-executable instructions;

Wherein, the processor is configured to invoke instructions stored in the memory to execute the method according to any one of claims 1-9.
A computer-readable storage medium on which computer program instructions are stored, wherein the computer program instructions implement the method according to any one of claims 1 to 9 when executed by a processor.
A computer program, comprising computer readable code, when the computer readable code is run in the electronic device, the processor in the electronic device executes the method for implementing any one of claims 1 to 9 method.