WO2023087656A1

WO2023087656A1 - Image generation method and apparatus

Info

Publication number: WO2023087656A1
Application number: PCT/CN2022/094971
Authority: WO
Inventors: 刘明聪; 李强; 秦泽奎; 张国鑫; 万鹏飞; 郑文
Original assignee: 北京达佳互联信息技术有限公司
Priority date: 2021-11-18
Filing date: 2022-05-25
Publication date: 2023-05-25
Also published as: CN114202456A

Abstract

The present invention relates to an image generation method and apparatus, an electronic device, and a storage medium. The method comprises: obtaining a preset object code and a target style code of a target style; performing style fusion processing on the target style code and the preset object code on the basis of network fusion parameters corresponding to a preset number of network layers in a style fusion network to obtain a target style fusion code, wherein the network fusion parameters are determined on the basis of fusion data corresponding to the preset number of network layers and a target fusion weight, and the target fusion weight is obtained by performing fusion weight learning on the basis of the target style code and the preset object code; and inputting the target style fusion code into a target image generation network for image generation processing to obtain a preset object style image corresponding to the target style.

Description

Image generation method and device

Cross References to Related Applications

This application claims the priority of Chinese Patent Application No. 202111371705.0 submitted on November 18, 2021, and the content disclosed in the above Chinese Patent Application is cited in its entirety as a part of this application.

technical field

The present disclosure relates to the technical field of image processing, and in particular to an image generation method, device, electronic equipment, and storage medium.

Background technique

With the continuous development of image processing technology, in the field of image applications, the image style conversion function has become a new interesting way to play. The image style conversion technology can generate an image with a target style based on a given object image such as a human face, so that it has an artistic effect similar to the target style, thereby realizing the conversion of the object image into an object of different styles such as animation, oil painting, and pencil drawing. style image.

In related technologies, in the process of generating an image of an object, it is necessary to pre-train a style transfer network with a style image generation function, but training a style transfer network often requires a large number of object style images as training images, and in order to ensure that the style transfer network The generated object-style image has the same features as the original object, and does not cause image distortion such as distortion and deformation. It is necessary to ensure the quality of the training image.

Contents of the invention

The present disclosure provides an image generation method, device, electronic equipment, and storage medium, which can quickly generate high-quality object-style images, and improve the efficiency of self-adaptive generation of multi-style object-style images. The disclosed technical scheme is as follows:

According to a first aspect of an embodiment of the present disclosure, an image generation method is provided, including:

Obtain the target style encoding of the preset object encoding and target style;

Based on network fusion parameters corresponding to a preset number of network layers in the style fusion network, perform style fusion processing on the target style code and the preset object code to obtain a target style fusion code; the network fusion parameters are based on the The fusion data corresponding to a preset number of network layers and the target fusion weight are determined, and the target fusion weight is obtained by performing fusion weight learning based on the target style code and the preset object code;

The target style fusion code is input into the target image generation network for image generation processing, and the preset object style image corresponding to the target style is obtained.

In some embodiments, performing style fusion processing on the target style code and the preset object code based on the network fusion parameters corresponding to a preset number of network layers in the style fusion network, and obtaining the target style fusion code includes:

Acquiring target fusion control data, the target fusion control data is used to control the fusion position of the target style code and the preset object code in the style fusion network;

determining fusion data corresponding to the preset number of network layers according to the target fusion control data;

Perform splicing processing on the target style code and the preset object code to obtain the target spliced code;

performing fusion weight learning based on the target splicing code to obtain the target fusion weight;

performing weighting processing on the fusion data and the target fusion weight to obtain the network fusion parameters;

Perform style fusion processing on the target style code and the preset object code in the preset number of network layers based on the network fusion parameters to obtain the target style fusion code.

In some embodiments, the determining the fusion data corresponding to the preset number of network layers according to the target fusion control data includes:

Comparing the number of layers corresponding to the preset number of network layers with the target fusion control data to obtain a comparison result;

According to the comparison result, determine fusion data corresponding to the preset number of network layers.

In some embodiments, said obtaining the target style encoding of the target style includes:

Acquiring a reference style image of the target style;

Inputting the reference style image into the style encoding network for style encoding processing to obtain the target style encoding.

In some embodiments, the method also includes:

Obtaining a positive sample style image pair and a negative sample style image pair of the target style;

Inputting the positive sample style image pair and the negative sample style image pair into the style coding network to be trained for style coding processing to obtain the sample style codes corresponding to the positive sample style image pair and the negative sample style image pair;

Inputting the sample style code into the perceptual network to be trained for perceptual processing, and obtaining the respective sample perceptual feature information corresponding to the positive sample style image pair and the negative sample style image pair;

determining comparative loss information according to the perceptual feature information of the sample;

training the style encoding network to be trained and the perception network to be trained based on the comparative loss information;

The trained style coding network to be trained is used as the style coding network.

Randomly generate an initial style code based on a first preset distribution;

Inputting the initial style code into the first multi-layer perceptual network for perceptual processing to obtain the target style code.

In some embodiments, the preset object code includes acquiring through the following steps:

Randomly generating an initial object code based on a second preset distribution;

Inputting the initial object code into the second multi-layer perceptual network for perceptual processing to obtain the preset object code.

In some embodiments, the method also includes:

Acquiring the first sample style code of the target style, the sample object code, the second sample style code of the non-target style, the preset style object image and the preset object image;

Based on the sample network fusion parameters corresponding to the preset number of network layers to be trained in the style fusion network to be trained, perform style fusion processing on the first sample style code and the sample object code to obtain a sample style fusion code;

Inputting the sample style fusion code into the image generation network to be trained for image generation processing to obtain the sample object style image corresponding to the target style;

Input the sample object style image, the preset style object image, the preset object image, the first sample style code and the second sample style code into the discriminant network to be trained for style discrimination processing, and obtain the target discrimination information;

determining target loss information according to the target discrimination information;

Training the style fusion network to be trained, the image generation network to be trained, and the discrimination network to be trained based on the target loss information;

The trained style fusion network to be trained is used as the style fusion network, and the trained image generation network to be trained is used as the target image generation network.

In some embodiments, the sample style fusion coding includes a first style fusion coding and a second style fusion coding; the sample network fusion parameters corresponding to a preset number of network layers to be trained based on the style fusion network to be trained, for Performing style fusion processing on the first sample style code and the sample object code to obtain the sample style fusion code includes:

Acquiring first fusion control data and second fusion control data, the first fusion control data is used to control the first sample style code and the sample object code, from the first style fusion network to be trained The first network layer starts to fuse; the second fusion control data is used to control the first sample style code not to participate in fusion in the style fusion network to be trained;

According to the first fusion control data and the second fusion control data, respectively determine the first sample fusion data and the second sample fusion data corresponding to the preset number of network layers to be trained;

Concatenating the first sample style code and the sample object code to obtain a sample concatenation code;

Perform fusion weight learning based on the sample splicing code to obtain the sample fusion weight;

performing weighting processing on the first sample fusion data, the second sample fusion data and the sample fusion weight respectively, to obtain the first sample network fusion parameters and the second sample network fusion parameters;

Based on the first sample network fusion parameters and the second sample network fusion parameters, respectively perform style fusion processing on the first sample style code and the sample object code in the preset number of network layers to be trained , to obtain the first style fusion code and the second style fusion code.

In some embodiments, the sample object style image includes a first sample object style image corresponding to the first style fusion encoding and a second sample object style image corresponding to the second style fusion encoding; the to-be-trained The discriminant network includes an object discriminant network, a style object discriminant network, and a style code discriminant network; the target discriminant information includes object discriminant information, style object discriminant information, and style code discriminant information;

The sample object style image, the preset style object image, the preset object image, the first sample style code and the second sample style code are input into the discriminant network to be trained for style discrimination processing, and the obtained Target discrimination information includes:

inputting the second sample object style image and the preset object image into the object discrimination network for object discrimination processing to obtain object discrimination information;

inputting the first sample object style image and the preset style object image into the style object discrimination network to perform style object discrimination processing to obtain style object discrimination information;

Inputting the second sample object style image, the first sample style code and the second sample style code into the style code discrimination network for style code discrimination processing to obtain style code discrimination information.

According to a second aspect of an embodiment of the present disclosure, an image generation method is provided, including:

acquiring a first original object image of a first target object;

inputting the first original object image into a first style conversion network to perform style conversion processing to obtain a first target object style image corresponding to the first target object;

The first style conversion network is a preset object style image based on the first sample object image and the target style generated by any image generation method provided in the first aspect, and the first preset image generation network is subjected to adversarial training to obtain of.

According to a third aspect of an embodiment of the present disclosure, an image generation method is provided, including:

Obtaining a second original object image and a target style label of a second target object;

inputting the second original object image and the target style label into a second style conversion network for style conversion processing, to obtain a second target object style image corresponding to the second target object;

The second style conversion network is based on the second sample object image, a variety of target style labels and a variety of target styles of preset object style images generated by any image generation method provided in the first aspect, for the second preset The target image generation network is obtained by adversarial training.

According to a fourth aspect of an embodiment of the present disclosure, an image generation device is provided, including:

an encoding acquisition module configured to perform target style encoding for acquiring preset object encodings and target styles;

The first style fusion processing module is configured to perform style fusion processing on the target style code and the preset object code based on network fusion parameters corresponding to a preset number of network layers in the style fusion network to obtain a target style fusion Coding; the network fusion parameters are determined based on fusion data corresponding to the preset number of network layers and target fusion weights, and the target fusion weights are fusion weights based on the target style coding and the preset object coding learned;

The first image generation processing module is configured to input the target style fusion code into the target image generation network for image generation processing, and obtain a preset object style image corresponding to the target style.

In some embodiments, the first style fusion processing module includes:

A target fusion control data acquisition unit configured to acquire target fusion control data, the target fusion control data is used to control the fusion position of the target style code and the preset object code in the style fusion network;

The fused data determination unit is configured to determine fused data corresponding to the preset number of network layers according to the target fused control data;

The first splicing processing unit is configured to perform splicing processing on the target style code and the preset object code to obtain the target spliced code;

The first fusion weight learning unit is configured to perform fusion weight learning based on the target splicing code to obtain the target fusion weight;

a first weighting processing unit configured to perform weighting processing on the fusion data and the target fusion weight to obtain the network fusion parameters;

The first style fusion processing unit is configured to perform style fusion processing on the target style code and the preset object code in the preset number of network layers based on the network fusion parameters to obtain the target style fusion encoding.

In some embodiments, the fusion data determination unit includes:

The comparison unit is configured to compare the number of layers corresponding to the preset number of network layers with the target fusion control data to obtain a comparison result;

The fused data determining subunit is configured to determine fused data corresponding to the preset number of network layers according to the comparison result.

In some embodiments, the code acquisition module includes:

a reference style image acquisition unit configured to acquire a reference style image of the target style;

The style encoding processing unit is configured to input the reference style image into the style encoding network for style encoding processing to obtain the target style encoding.

In some embodiments, the device also includes:

The sample image acquisition module is configured to acquire the positive sample style image pair and the negative sample style image pair of the target style;

The style coding processing module is configured to perform style coding processing by inputting the positive sample style image pair and the negative sample style image pair into the style coding network to be trained, to obtain the positive sample style image pair and the negative sample style image pair The respective sample style codes;

The perceptual processing module is configured to input the sample style code into the perceptual network to be trained for perceptual processing, and obtain the sample perceptual feature information corresponding to the positive sample style image pair and the negative sample style image pair;

A comparison loss information determination module configured to determine the comparison loss information according to the sample perception feature information;

A first network training module configured to perform training of the style coding network to be trained and the perception network to be trained based on the comparison loss information;

The style coding network determining module is configured to execute the trained style coding network to be trained as the style coding network.

In some embodiments, the code acquisition module includes:

an initial style code generating unit configured to randomly generate an initial style code based on a first preset distribution;

The first perceptual processing unit is configured to input the initial style code into the first multi-layer perceptual network for perceptual processing to obtain the target style code.

In some embodiments, the code acquisition module includes:

The initial object code generating unit is configured to randomly generate the initial object code based on the second preset distribution;

The second perceptual processing unit is configured to input the initial object code into the second multi-layer perceptual network for perceptual processing to obtain the preset object code.

In some embodiments, the device also includes:

A sample data acquisition module configured to acquire the first sample style code of the target style, the sample object code, the second sample style code of the non-target style, the preset style object image and the preset object image;

The second style fusion processing module is configured to perform style encoding on the first sample style code and the sample object code based on sample network fusion parameters corresponding to a preset number of network layers to be trained in the style fusion network to be trained Fusion processing to obtain sample style fusion coding;

The second image generation processing module is configured to input the sample style fusion code into the image generation network to be trained for image generation processing, and obtain the sample object style image corresponding to the target style;

a style discrimination processing module configured to input the sample object style image, the preset style object image, the preset object image, the first sample style code and the second sample style code into the discriminator to be trained The network performs style discrimination processing to obtain target discrimination information;

A target loss information determination module configured to determine target loss information according to the target discrimination information;

The second network training module is configured to perform training of the style fusion network to be trained, the image generation network to be trained, and the discrimination network to be trained based on the target loss information;

The network determination module is configured to use the trained style fusion network to be trained as the style fusion network, and use the trained image generation network to be trained as the target image generation network.

In some embodiments, the style fusion coding of the sample includes a first style fusion coding and a second style fusion coding; the second style fusion processing module includes:

The sample fusion control data acquisition unit is configured to perform acquisition of first fusion control data and second fusion control data, the first fusion control data is used to control the first sample style code and the sample object code, from The first network layer in the style fusion network to be trained starts to fuse; the second fusion control data is used to control the first sample style code not to participate in fusion in the style fusion network to be trained;

The sample fusion data determination unit is configured to determine the first sample fusion data and the second fusion data corresponding to the preset number of network layers to be trained according to the first fusion control data and the second fusion control data. Sample fusion data;

A second splicing processing unit configured to perform splicing processing on the first sample style code and the sample object code to obtain a sample spliced code;

The second fusion weight learning unit is configured to perform fusion weight learning based on the sample splicing code to obtain the sample fusion weight;

The second weighting processing unit is configured to perform weighting processing on the first sample fusion data, the second sample fusion data and the sample fusion weight respectively, to obtain the first sample network fusion parameters and the second sample network fusion parameter;

The second style fusion processing unit is configured to perform, based on the network fusion parameters of the first sample and the network fusion parameters of the second sample, the first sample style in the preset number of network layers to be trained respectively. performing style fusion processing on the coding and the sample object coding to obtain the first style fusion coding and the second style fusion coding.

The style discrimination processing module includes:

an object discrimination processing unit configured to perform object discrimination processing by inputting the second sample object style image and the preset object image into the object discrimination network to obtain object discrimination information;

a style object discrimination processing unit configured to perform style object discrimination processing by inputting the first sample object style image and the preset style object image into the style object discrimination network to obtain style object discrimination information;

a style encoding discrimination processing unit configured to perform style encoding discrimination processing by inputting the second sample object style image, the first sample style code and the second sample style code into the style coding discrimination network to obtain Style encodes discriminative information.

According to a fifth aspect of an embodiment of the present disclosure, an image generating device is provided, including:

an original object image acquisition module configured to perform acquisition of a first original object image of the first target object;

The first style conversion processing module is configured to perform style conversion processing by inputting the first original object image into a first style conversion network to obtain a first target object style image corresponding to the first target object;

According to a sixth aspect of the embodiments of the present disclosure, there is provided an image generation device, including:

a data acquisition module configured to perform acquisition of a second original object image and a target style label of a second target object;

The second style conversion processing module is configured to input the second original object image and the target style label into a second style conversion network for style conversion processing, and obtain the second target object style corresponding to the second target object image;

According to a seventh aspect of the embodiments of the present disclosure, there is provided an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement The method as described in any one of the first aspect, the second aspect and the third aspect above.

According to the eighth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium. When the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute the first method of the embodiments of the present disclosure. Aspect, the method described in any one of the second aspect and the third aspect.

According to a ninth aspect of the embodiments of the present disclosure, there is provided a computer program product containing instructions, which, when run on a computer, causes the computer to execute any one of the first aspect, the second aspect and the third aspect of the embodiments of the present disclosure. method described in the item.

In the process of generating a stylized image of a certain type of object (preset object style image), the stylized image is decoupled into two parts, object encoding and style encoding, and in the style fusion network, the target fusion weight is combined with the preset The network fusion parameters determined by the fusion data corresponding to the number of network layers are used to perform style fusion processing on the target style code and the preset object code. Through the fusion of the two, the object characteristics that can represent a certain type of object The target style fusion coding that effectively integrates the target style, and the target fusion weight is based on the target style coding and the preset object coding for fusion weight learning, which can realize adaptive adjustment of the fusion weight of the object under different target styles, and then better fusion Object style encoding under the target style. On the basis of greatly improving the stylized effect and stylized image quality, it can greatly improve the adaptive generation efficiency of multi-style object style images.

Description of drawings

Fig. 1 is a schematic diagram showing an application environment according to an exemplary embodiment;

Fig. 2 is a flowchart of an image generation method shown according to an exemplary embodiment;

Fig. 3 is a schematic network structure diagram of a style coding network provided according to an exemplary embodiment;

Fig. 4 is a flowchart showing a pre-trained style coding network according to an exemplary embodiment;

Fig. 5 is a network fusion parameter corresponding to a preset number of network layers in a style fusion network according to an exemplary embodiment, and performs style fusion processing on target style coding and preset object coding to obtain target style fusion coding flow chart;

Fig. 6 is a flowchart showing a pre-trained target image generation network and style fusion network according to an exemplary embodiment;

Fig. 7 shows a style fusion process based on the sample network fusion parameters corresponding to the preset number of network layers to be trained in the style fusion network to be trained, according to an exemplary embodiment, to perform style fusion processing on the first sample style coding and sample object coding , to obtain the flow chart of sample style fusion coding;

Fig. 8 is a schematic diagram of a training style fusion network and a target image generation network according to an exemplary embodiment;

Fig. 9 is a flowchart of an image generation method according to an exemplary embodiment;

Fig. 10 is a flowchart of another image generation method according to an exemplary embodiment;

Fig. 11 is a block diagram of an image generating device according to an exemplary embodiment;

Fig. 12 is a block diagram of another image generating device according to an exemplary embodiment;

Fig. 13 is a block diagram of another image generating device according to an exemplary embodiment;

Fig. 14 is a block diagram of an electronic device for image generation according to an exemplary embodiment;

Fig. 15 is a block diagram showing another electronic device for image generation according to an exemplary embodiment.

Detailed ways

It should be noted that the terms "first" and "second" in the specification and claims of the present disclosure and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.

It should be noted that the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for display, data for analysis, etc.) involved in this disclosure are authorized by the user. Or information and data fully authorized by the parties.

Please refer to FIG. 1 . FIG. 1 is a schematic diagram showing an application environment according to an exemplary embodiment. As shown in FIG. 1 , the application environment may include a terminal 100 and a server 200 .

The terminal 100 can be used to provide a stylized image (object style image) generation service of a target object for any user. In some embodiments, the terminal 100 may include, but not limited to, smartphones, desktop computers, tablet computers, notebook computers, smart speakers, digital assistants, augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) devices, Electronic devices such as smart wearable devices may also be software running on the above-mentioned electronic devices, such as application programs. In some embodiments, the operating system running on the electronic device may include but not limited to Android system, IOS system, linux, windows and so on.

In some embodiments, the server 200 can provide background services for the terminal 100, pre-generate object style images for training the style transfer network, and train the object images that can be used to convert object images into stylized images (object style images of the target style) . In some embodiments, the server 200 can be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, Cloud servers for basic cloud computing services such as network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network, content distribution network), and big data and artificial intelligence platforms.

In addition, it should be noted that what is shown in FIG. 1 is only an application environment provided by the present disclosure, and in actual application, other application environments may also be included, for example, more terminals may be included.

In the embodiment of this specification, the above-mentioned terminal 100 and server 200 may be directly or indirectly connected through wired or wireless communication, which is not limited in this disclosure.

Fig. 2 is a flowchart of an image generation method according to an exemplary embodiment. As shown in Fig. 2 , the image generation method is used in electronic devices such as terminals and servers, and includes the following steps.

In step S201, a preset object code and a target style code of a target style are obtained.

In some embodiments, the target style can be any image style. The division of image style can be divided into multiple ways according to actual application requirements. In some embodiments, the target style may include but not limited to image styles such as animation, oil painting, and pencil drawing.

In some embodiments, the target style code may be coded information that can characterize the style features of the target style. The preset object code may be coded information capable of characterizing object features of a certain type of object. In some embodiments, the objects may include, but are not limited to, human faces, cat faces, dog faces, and other objects requiring style conversion.

In some embodiments, the above-mentioned target style may be an image style extracted from a certain reference style image, and correspondingly, the above-mentioned target style encoding for obtaining the target style may include:

Obtain the reference style image of the target style;

The reference style image is input into the style encoding network for style encoding processing, and the target style encoding is obtained.

In some embodiments, the style image may be an image with a certain style. Taking the subject as a human face as an example, and the target style as an example of an anime style, the style image is an anime human face image.

In some embodiments, the style coding network can be obtained by comparing the style coding network to be trained with the target style-based positive sample style image pair and negative sample style image pair.

In some embodiments, the network structure of the style coding network can be preset in combination with actual applications. In some embodiments, as shown in FIG. 3 , FIG. 3 is a schematic network structure diagram of a style coding network provided according to an exemplary embodiment. In some embodiments, the style encoding network may include sequentially connecting a convolutional neural network, a style feature extraction network, a feature splicing network, and a multi-layer perceptual network.

In some embodiments, the convolutional neural network can be used to extract the image feature information of the reference style image; the style feature extraction network can be used to extract the style feature information in the image feature information; the feature splicing network can be used to stitch the style feature information into a long vector of a preset dimension (the preset dimension is consistent with the input dimension of the multi-layer perceptual network); the above-mentioned multi-layer perceptual network can include two parts of the multi-layer perceptual network connected in sequence, and the first part of the multi-layer perceptual network can be used to reduce Dimensions, and convert long vectors of preset dimensions into more dense style codes, which makes it easier to train image generation networks. The second part of the multi-layer perceptual network can be used to reduce the dimensionality of the encoded information output by the first part of the multi-layer perceptual network, and transform the encoded information from (encoding space) to the distribution space where the image generation network is located.

In some embodiments, the specific network structures of convolutional neural network, style feature extraction network, feature splicing network and multi-layer perceptual network can also be set in combination with practical applications.

In some embodiments, the above method may also include: the step of pre-training the style coding network, as shown in FIG. 4 , the pre-training of the style coding network may include the following steps:

In step S401, a positive sample style image pair and a negative sample style image pair of the target style are obtained;

In step S403, the positive sample style image pair and the negative sample style image pair are input into the style coding network to be trained for style coding processing, and the respective corresponding sample style codes of the positive sample style image pair and the negative sample style image pair are obtained;

In step S405, the sample style code is input into the perception network to be trained for perceptual processing, and the respective sample perception feature information corresponding to the positive sample style image pair and the negative sample style image pair are obtained;

In step S407, according to the sample perceptual feature information, determine the contrast loss information;

In step S409, based on the comparison loss information, the style encoding network to be trained and the perception network to be trained are trained;

In step S411, the trained style coding network to be trained is used as the style coding network.

In some embodiments, the network structure of the style coding network to be trained is the same as that of the style coding network, but the network parameters are different.

In some embodiments, a reference style image of a target style can be obtained, and by performing multiple affine transformations on the reference style image, correspondingly, the style image after each affine transformation is combined with the reference style image A pair of positive sample style images; in some embodiments, the translation amount of multiple affine changes can be different.

In some embodiments, a plurality of non-target style reference style images may be obtained, and the plurality of non-target style reference images are respectively combined with target style reference style images to form multiple pairs of target style negative sample style image pairs. In some embodiments, multiple affine transformations may be performed on a reference style image of a non-target style, and the style images after multiple affine transformations are respectively combined with the reference style image of the target style to form multiple pairs of negative Sample style image pair.

In some embodiments, in order to use the self-supervised learning strategy to train the above-mentioned style coding network to be trained, the corresponding sample style codes of the positive sample style image pair and the negative sample style image pair output by the style coding network to be trained can be input to the training The perceptual network performs perceptual processing to obtain the sample perceptual feature information corresponding to the positive sample style image pair and the negative sample style image pair, and determines the comparative loss information according to the sample perceptual feature information.

In some embodiments, the preset contrast loss function may be combined in the process of determining the contrast loss information according to the sample perceptual feature information. In some embodiments, the preset contrast loss function may be any kind of contrast loss function. In some In an embodiment, for example, the NT-Xent contrast loss function (the normalized temperature-scaled cross entropy loss, the normalized temperature-scaled cross entropy loss function).

In some embodiments, the above-mentioned training of the style encoding network to be trained and the perception network to be trained based on the comparison loss information may include: updating the network parameters in the style encoding network to be trained and the perception network to be trained based on the comparison loss information; then, based on The updated style coding network to be trained and the perception network to be trained repeatedly input the positive sample style image pair and the negative sample style image pair into the style coding network to be trained for style coding processing, and obtain the positive sample style image pair and the negative sample style image pair respectively. The corresponding sample style is encoded to a training iterative operation of updating network parameters in the style encoding network to be trained and the perception network to be trained based on the comparison loss information, until the first preset convergence condition is reached.

In some embodiments, when the first preset convergence condition is reached, the current style encoding network to be trained (the trained style encoding network to be trained) is used as the style encoding network.

In some embodiments, reaching the first preset convergence condition may be that the number of training iteration operations reaches the first preset training number. In some embodiments, reaching the first preset convergence condition may also be that the contrast loss information is smaller than the first preset threshold. In the embodiment of this specification, the first preset training times and the first preset threshold can be preset in combination with the training speed and accuracy of the network in practical applications.

In the above-mentioned embodiment, the positive sample style image pair and the negative sample style image pair based on the target style are obtained by comparative training of the style coding network to be trained, which can realize the training of the self-supervised style coding network, and effectively ensure that the trained style coding The representation accuracy of the network on style features. And based on the trained style coding network, the target style coding of the target style is extracted from the style image of the target style, which can effectively improve the accuracy of the target style coding for the target style representation.

In some embodiments, the above-mentioned target style may also be an image style obtained by random sampling. Correspondingly, the above-mentioned target style encoding for obtaining the target style includes:

Randomly generate an initial style code based on a first preset distribution;

The initial style code is input into the first multi-layer perceptual network for perceptual processing, and the target style code is obtained.

In some embodiments, the first preset distribution may be a preset encoding distribution, and in some embodiments, the first preset distribution may include but not limited to a Gaussian distribution.

In some embodiments, the first multi-layer perceptual network can be used to reduce the dimensionality of the initial style code, and transform the initial style code from (encoding space, such as Gaussian distribution space) to the distribution space where the image generation network is located.

In the above-mentioned embodiment, by randomly generating the initial style code, the generation efficiency and diversity of the style code can be improved, and the flexibility of generating the style image of the object can be greatly improved; and after the initial style code is randomly generated based on the first preset distribution, combined with the first multiple Layer-aware networks perform perceptual processing, making it easier to train image generation networks.

In some embodiments, the above-mentioned preset object code includes obtaining by the following steps:

The initial object code is input into the second multi-layer perceptual network for perceptual processing, and the preset object code is obtained.

In some embodiments, the second preset distribution may be a preset encoding distribution, and in some embodiments, the second preset distribution may include but not limited to a Gaussian distribution.

In some embodiments, the second multi-layer perceptual network can be used to reduce the dimensionality of the initial object encoding and transform the initial object encoding from (encoding space, eg Gaussian distribution space) to the distribution space where the image generation network resides.

In the above embodiment, by randomly generating the initial object code, the generation efficiency of the object code can be improved, and a large number of object style images can be quickly generated for a certain style, which effectively improves the image generation efficiency of a certain style, and based on the second preset After the initial object encoding is randomly generated by the distribution, combined with the second multi-layer perceptual network for perceptual processing, it can be easier to train the image generation network.

In step S203, based on the network fusion parameters corresponding to the preset number of network layers in the style fusion network, the style fusion processing is performed on the target style code and the preset object code to obtain the target style fusion code.

In some embodiments, the style fusion network can be used to perform style fusion processing on target style codes and preset object codes. In some embodiments, the style fusion network may include a preset number of network layers; in some embodiments, the number of network layers (preset number) may be set in conjunction with actual applications.

In some embodiments, the aforementioned style fusion network may be a style fusion network capable of regulating the degree of fusion and adaptively adjusting fusion weights. In some embodiments, the degree of fusion can be controlled by regulating the fusion position of the target style code and the preset object code in the style fusion network (that is, from which network layer to start fusion). In some embodiments, fusion weight learning of target style codes and preset object codes can be performed to achieve adaptive adjustment of fusion weights of objects under different target styles.

In some embodiments, the above-mentioned network fusion parameters may be determined based on fusion data corresponding to a preset number of network layers and target fusion weights, and the above-mentioned target fusion weights are obtained by performing fusion weight learning based on target style coding and preset object coding .

In some embodiments, as shown in FIG. 5, the above-mentioned network fusion parameters corresponding to the preset number of network layers in the style fusion network perform style fusion processing on the target style code and the preset object code, and the target style fusion code can be obtained. Include the following steps:

In step S501, target fusion regulation data is acquired.

In some embodiments, the above target fusion control data is used to control the fusion position of the target style code and the preset object code in the style fusion network.

In step S503, according to the target fusion control data, the fusion data corresponding to a preset number of network layers are determined.

In some embodiments, the determination of the fusion data corresponding to the preset number of network layers according to the target fusion control data may include:

Comparing the number of layers corresponding to the preset number of network layers with the target fusion control data to obtain the comparison result;

According to the comparison result, the fused data corresponding to the preset number of network layers is determined.

In some embodiments, the fusion data corresponding to each network layer may represent whether the target style code participates in fusion at the network layer. In some embodiments, the preset number of network layers included in the style fusion network may be a preset number of sequentially arranged network layers, such as the 0th layer to the nth layer; in some embodiments, the comparison result indicates that the network If the number of layers is less than the target fusion control data, the fusion data corresponding to this network layer is 0, that is, the target style code does not participate in fusion at this network layer; otherwise, when the comparison result indicates that the number of layers of the network layer is greater than or equal to the target fusion In the case of regulating data, the fusion data corresponding to this network layer is 1, that is, the target style coding participates in fusion at this network layer.

In the above-mentioned embodiment, based on the fusion data determined by the target fusion control data to control whether the target style coding participates in the fusion in multiple network layers, the controllable style feature fusion can be realized, and then the subsequent generation can be guaranteed to have a higher similarity with the natural object. High object style images with different degrees of stylization, flexible control of the strength of object stylization, better respond to the needs of different scenarios, and greatly improve the image quality of generated object style images and the generation efficiency of object style images .

In step S505, concatenate the target style code and the preset object code to obtain the target concatenated code.

In step S507, fusion weight learning is performed based on the target splicing code to obtain target fusion weights.

In some embodiments, the target concatenation code can be input into the fully connected layer to perform fusion weight learning to obtain the above target fusion weight. In some embodiments, the target fusion weight can be used to control the fusion ratio of the target style code and the preset object code in each fusion layer.

In the above-mentioned embodiment, the fusion weight learning is performed in combination with target splicing coding including target style coding and preset object coding, so that the learned target fusion weights are adaptive to different types of objects and styles, and then the target style is better fused. Object image below.

In step S509, the fusion data and target fusion weights are weighted to obtain network fusion parameters.

In some embodiments, both the target fusion weight and the fusion data are data in the form of a matrix. Correspondingly, the target fusion weight and the corresponding elements in the fusion data can be multiplied to obtain the above network fusion parameters.

In step S511, the target style code and the preset object code are subjected to style fusion processing in a preset number of network layers based on the network fusion parameters to obtain the target style fusion code.

In some embodiments, the network fusion parameters corresponding to the above preset number of network layers can be used as the weight of the target style coding, and the matrix difference after subtracting the network fusion parameters from the matrix whose elements are all ones can be used as the weight of the preset object coding , and based on the weight of the target style code and the weight of the preset object code, the target style code and the preset object code are weighted and summed to obtain the above target fusion code.

In some embodiments, it is assumed that the network fusion layer includes 18 network layers. In some embodiments, when the target fusion control data is 0, the target style code and the preset object code are fused from the first network layer, Correspondingly, the subsequent image generation network outputs a fully stylized object style image; when the target fusion control data is 18, the target style code does not participate in the fusion; correspondingly, the subsequent image generation network outputs an unstylized natural object Image; when the target fusion control data is greater than 1 and less than 18, the target style code and the preset object code are fused from the i-th (target fusion control data) layer, and the low-resolution layer (that is, the first layer to the i-th layer 1) It is not affected by the target style coding; based on the target fusion control data, it can ensure that the target style fusion code retains the same feature information as the natural object, and has different degrees of style characteristics, which in turn can ensure that the subsequent generation is more similar to the natural object. High object style images with different stylization levels can realize flexible control of object stylization strength. Correspondingly, the above-mentioned target fusion control data is used to control the fusion position of the target style code and the preset object code in the style fusion network, which may be the start fusion position of the control target style code and the preset object code in the style fusion network.

In the above-mentioned embodiment, in the style fusion network, combining the target fusion control data to control the fusion position of the target style code and the preset object code in multiple network layers, the regulation of the degree of fusion can be realized . In addition, combined with target splicing coding including target style coding and preset object coding, fusion weight learning is carried out, so that the learned target fusion weight is adaptive to different types of objects and styles, and the adaptive adjustment of objects under different target styles is realized. Fusion weights, and then better integrate the object style code under the target style, which can ensure that the obtained target style fusion code retains the same feature information as the natural object, and has adjustable style features and adaptive fusion weights. In turn, it can ensure that the subsequent generation of object-style images with a high degree of similarity to natural objects and different stylization degrees can achieve flexible control of the strength of object stylization, and improve the efficiency of adaptive generation of multi-style object-style images, which can further improve It can better meet the needs of different scenes, and greatly improve the image quality of the generated object-style images and the generation efficiency of multi-style object-style images.

In step S205, the target style fusion code is input into the target image generation network for image generation processing, and a preset object style image corresponding to the target style is obtained.

In some embodiments, the target image generation network can be used to generate a preset object style image corresponding to the target style. In some embodiments, the above method also includes: the step of pre-training the target image generation network and the style fusion network, in some embodiments, as shown in Figure 6, the step of pre-training the target image generation network and the style fusion network Can include:

In step S601, a first sample style code of a target style, a sample object code, a second sample style code of a non-target style, a preset style object image, and a preset object image are acquired.

In some embodiments, the acquisition manner of the first sample style code and the second sample style code may refer to the above-mentioned manner of obtaining the target style code, which will not be repeated here. For the acquisition method of the sample object code, refer to the above-mentioned method for obtaining the preset object code, which will not be repeated here.

In some embodiments, the preset style object image can be obtained from the stylized object image training set corresponding to the target style. Taking the object as a human face as an example, the preset style can be obtained from the collected face style image training set of the target style face image. In some embodiments, the preset object image may be an original object image. Taking the object as a human face as an example, it may be an image of a real human face.

In step S603, based on the sample network fusion parameters corresponding to the preset number of network layers to be trained in the style fusion network to be trained, the style fusion processing is performed on the first sample style code and the sample object code to obtain the sample style fusion code.

In some embodiments, the style fusion network to be trained may include a preset number of network layers to be trained; the above-mentioned sample network fusion parameters may be determined based on the sample fusion data and sample fusion weights corresponding to the preset number of network layers to be trained, The above-mentioned sample fusion weight is obtained by performing fusion weight learning based on the first sample style coding and sample object coding.

In some embodiments, the above sample style fusion coding includes the first style fusion coding and the second style fusion coding; correspondingly, as shown in FIG. The sample network fusion parameters of the first sample style code and the sample object code are subjected to style fusion processing, and the sample style fusion code obtained may include the following steps:

In step S701, first fusion control data and second fusion control data are obtained.

In some embodiments, the above-mentioned first fusion control data is used to control the first sample style coding and sample object coding, and the fusion starts from the first network layer in the style fusion network to be trained, combined with the implementation of the above-mentioned target fusion control data In the example, the first fusion control data is 1; in some embodiments, the second fusion control data can be used to control the first sample style encoding not to participate in fusion in the style fusion network to be trained, and the above preset number is 18 Taking the embodiment of the example as an example, the second fusion regulation data is 18.

In step S703, according to the first fused control data and the second fused control data, first sample fused data and second sample fused data corresponding to a preset number of network layers to be trained are respectively determined.

In some embodiments, according to the first fusion regulation data, determine the first sample fusion data corresponding to the preset number of network layers to be trained; and according to the second fusion regulation data, determine the preset number of network layers corresponding to the training The second sample fuses the data.

In step S705, concatenate the first sample style code and the sample object code to obtain the sample spliced code;

In step S707, fusion weight learning is performed based on sample splicing coding to obtain sample fusion weights;

In step S709, the first sample fusion data and the second sample fusion data are weighted with the sample fusion weight respectively to obtain the first sample network fusion parameters and the second sample network fusion parameters;

In step S711, based on the network fusion parameters of the first sample and the network fusion parameters of the second sample, style fusion processing is performed on the style coding of the first sample and the coding of the sample object in the preset number of network layers to be trained respectively, to obtain the first One-style fusion encoding and second-style fusion encoding.

In some embodiments, the specific details of the above steps S705 to 711 can refer to the specific details of the above steps S505 to 511 , which will not be repeated here.

In the above embodiment, in the network training process, combining the first fusion control data and the second fusion control data, the sample object encoding can be combined with the first sample style encoding to varying degrees, and the object style is strong. Weak and flexible control, in order to improve the image quality of subsequent generated object-style images and the generation efficiency of object-style images.

In step S605, the sample style fusion code is input into the image generation network to be trained for image generation processing, and the sample object style image corresponding to the target style is obtained.

In some embodiments, the sample object style image may include a first sample object style image corresponding to the first style fusion coding and a second sample object style image corresponding to the second style fusion coding; correspondingly, the above-mentioned sample style fusion coding Inputting the image generation network to be trained for image generation processing to obtain the sample object style image corresponding to the target style includes: inputting the first style fusion code and the second style fusion code into the image generation network to be trained for image generation processing to obtain the first sample Object style image and second sample object style image.

In step S607, input the sample object style image, the preset style object image, the preset object image, the first sample style code and the second sample style code into the discriminant network to be trained for style discriminant processing to obtain target discriminant information.

In some embodiments, the discriminant network to be trained includes an object discriminant network, a style object discriminant network, and a style code discriminant network; correspondingly, the target discriminant information may include object discriminant information, style object discriminant information, and style code discriminant information.

In some embodiments, when the style fusion network to be trained is a network with a fixed fusion structure, the above-mentioned sample object style image, preset style object image, preset object image, first sample style encoding and second The sample style code is input into the discrimination network to be trained for style discrimination processing, and the target discrimination information obtained includes: inputting the sample object style image and the preset object image into the object discrimination network for object discrimination processing, and obtaining object discrimination information (the object discrimination information may include object Discriminate the corresponding characteristic information of the sample object style image output by the network and the preset object image); input the sample object style image and the preset style object image into the style object discrimination network to perform style object discrimination processing, and obtain the style object discrimination information (style object The discriminant information may include the corresponding feature information of the sample object style image output by the style object discrimination network and the preset style object image); the sample object style image, the first sample style code and the second sample style code are input into the style code discrimination network Perform style coding discrimination processing to obtain style coding discrimination information (the style coding discrimination information may include feature information corresponding to the first sample style code and the second sample style code output by the style coding discrimination network).

In another optional embodiment, when the style fusion network to be trained is a network capable of regulating the fusion degree, the above-mentioned sample object style image, preset style object image, preset object image, first sample style The encoding and the second sample style encoding are input into the discriminant network to be trained for style discriminant processing, and obtaining the target discriminant information may include: inputting the second sample object style image and the preset object image into the object discriminant network for object discriminant processing, and obtaining the object discriminant information ( The object discrimination information may include the second sample object style image output by the object discrimination network and the corresponding feature information of the preset object image); the first sample object style image and the preset style object image are input into the style object discrimination network for style Object discrimination processing, obtain style object discrimination information (this style object discrimination information can comprise the feature information corresponding to the first sample object style image output by the style object discrimination network and the preset style object image respectively); the second sample object style image , the first sample style code and the second sample style code are input into the style code discrimination network for style code discrimination processing, and the style code discrimination information is obtained (the style code discrimination information may include the first sample style code and the style code discrimination network output The second sample style codes corresponding feature information).

In the above embodiment, the adversarial training of the image generation network to be trained for generating the style image of the object is carried out from the three dimensions of style discrimination, style object discrimination and style coding discrimination, which can greatly improve the ability of the trained image generation network to stylize the image of the object. The representation ability of the method improves the quality of the generated object-style images.

In step S609, target loss information is determined according to the target discrimination information.

In some embodiments, the target loss information may include generation loss information corresponding to the image generation network to be trained and discrimination loss information corresponding to the discriminative network to be trained.

In some embodiments, in the process of determining the target loss information according to the target discrimination information, an adversarial loss function may be combined. In some embodiments, the object discrimination loss between the feature information corresponding to the second sample object style image output by the object discrimination network and the preset object image can be determined in combination with the adversarial loss function; in combination with the adversarial loss function, the style object discrimination network can be determined The style object discrimination loss between the corresponding feature information of the output first sample object style image and the preset style object image; combined with the adversarial loss function, determine the first sample style code and the second sample style code output by the style coding discriminant network Encodes the style encoding discriminative loss between the corresponding feature information.

Further, the object discrimination loss, the style object discrimination loss and the style encoding discrimination loss can be added to obtain the above generation loss information, and the negative number of the generation loss information can be used as the discrimination loss information.

In step S611, based on the target loss information, the style fusion network to be trained, the image generation network to be trained and the discrimination network to be trained are trained.

In some embodiments, the above-mentioned training of the style fusion network to be trained, the image generation network to be trained, and the discrimination network to be trained based on the target loss information may include: updating the image generation network to be trained and the style fusion network to be trained based on the generation loss information , and update the network parameters in the discriminant network to be trained (object discriminant network, style object discriminant network, and style encoding discriminant network) based on discriminative loss information; then, based on the updated style fusion network to be trained, images to be trained are generated The network and the discriminant network to be trained are repeatedly based on the sample network fusion parameters corresponding to the preset number of network layers to be trained in the style fusion network to be trained, and the style fusion process is performed on the first sample style code and the sample object code to obtain the sample style fusion code , to update the network parameters in the image generation network to be trained and the style fusion network to be trained based on the generation loss information, and update the discriminant network to be trained (object discriminant network, style object discriminant network and style encoding discriminant network) based on discriminative loss information The training of the network parameters is performed iteratively until the second preset convergence condition is reached.

In some embodiments, reaching the second preset convergence condition may be that the number of training iteration operations reaches the second preset training number. In some embodiments, reaching the second preset convergence condition may also be that the generated loss information is less than a second preset threshold. In the embodiment of this specification, the second preset training times and the second preset threshold can be preset in combination with the training speed and accuracy of the network in practical applications.

In step S613, the trained style fusion network to be trained is used as a style fusion network, and the trained image generation network to be trained is used as a target image generation network.

In some embodiments, when the second preset convergence condition is reached, the current style fusion network to be trained (the trained style fusion network to be trained) is used as the above-mentioned style fusion network, and the current image to be trained is generated Network (trained image generation network to be trained) is used as the above-mentioned target image generation network.

In some embodiments, as shown in FIG. 8 , FIG. 8 is a schematic diagram of a training style fusion network and a target image generation network provided according to an exemplary embodiment.

In the above embodiments, the joint training of the target image generation network and the style fusion network can realize the fusion of style features and object features, and the target image generation network can be trained based on the fused sample object style images, which can greatly improve the training performance. The ability of the image generation network to represent the object style and object features can effectively improve the quality of the subsequent generated object style images.

In some embodiments, a large number of preset object style images of the target style can be generated based on the above-mentioned image generation method provided by the embodiments of this specification. The image generation method generates a large number of preset object style images of the target style, and conducts adversarial training on the first preset image generation network to obtain the first style transfer network. The first style transfer network may be used to generate an object image of a target style for a certain target object. In some embodiments, during the process of adversarial training for the first preset image generation network based on the first sample object image and the above-mentioned image generation method to generate a preset object style image of the target style, the first sample object can be The image is input into the first preset image generation network for style conversion processing to obtain the object style image corresponding to the first sample object image; and the object style image and the corresponding preset object style image are input into the corresponding discriminant network for style discrimination processing , to obtain the first style discriminant information; and based on the first style discriminant information, determine the corresponding discriminant loss information; furthermore, the first preset image generation network and the corresponding discriminant network can be trained based on the discriminative loss information, and the trained The first preset image generation network is used as the first style transfer network.

In some embodiments, based on the above-mentioned image generation method provided by the embodiments of this specification, preset object style images of various target styles can be generated. The target style label and the above image generation method generate preset object style images of multiple target styles, and conduct confrontation training on the second preset image generation network to obtain a second style transfer network. The second style transfer network can be used to generate object style images of various target styles. In some embodiments, based on the second sample object image (object images of a large number of real objects) multiple object style labels and the above-mentioned image generation method to generate preset object style images of multiple target styles, the second preset image is generated In the process of adversarial training of the network, the second sample object image and the corresponding target style label can be input into the second preset image generation network for style conversion processing, and the object style corresponding to the second sample object image matching the target style label can be obtained image; and the object style image and the corresponding preset object style image are input into the corresponding discriminant network for style discriminant processing to obtain the second style discriminant information; in addition, a discriminant network can also be added to generate the second preset image The object style image output by the network is judged whether it is the style corresponding to the target style label, and the third style discrimination information is obtained; and based on the second style discrimination information and the third style discrimination information, the corresponding loss information is determined; and then based on the loss information To train the second preset image generation network and the corresponding discriminant network, and use the trained second preset image generation network as the second style transfer network.

It can be seen from the above technical solutions provided by the embodiments of this specification that in the process of generating a stylized image (preset object style image) of a certain type of object, this specification decouples the stylized image into two parts: object coding and style coding. And in the style fusion network, combined with the fusion data corresponding to the target fusion weight and the preset number of network layers, and the determined network fusion parameters, the style fusion processing is performed on the target style code and the preset object code, and the fusion of the two can be performed , to obtain the target style fusion code that can not only represent the object characteristics of a certain type of object, but also effectively integrate into the target style, and the target fusion weight is based on the target style code and the preset object code for fusion weight learning, which can realize adaptive adjustment The fusion weights of objects under different target styles, and then better fuse the object style encoding under the target style. On the basis of greatly improving the stylized effect and stylized image quality, it can greatly improve the adaptive generation efficiency of multi-style object style images.

Fig. 9 is a flowchart of another image generation method according to an exemplary embodiment. As shown in Fig. 9, the image generation method is used in terminals and server electronic devices, and includes the following steps.

In step S901, a first original object image of a first target object is acquired;

In step S903, input the first original object image into the first style conversion network to perform style conversion processing, and obtain the first target object style image corresponding to the first target object;

In some embodiments, the first original object image may be an object image of the first target object uploaded by the user through the terminal. Taking the first target object as an example of a user's face, the first original object image may be a real face image of the user.

In some embodiments, the first target object style image may be an object image of the target style of the first target object.

In some embodiments, the terminal may input the first original object image into the first style conversion network for style conversion processing to obtain the first target object style image corresponding to the first target object. The terminal may also send the first original object image to the server, and the server generates the first target object style image based on the first style conversion network and transmits it to the terminal.

In the above embodiment, the original object image of the first target object is combined with the first style conversion network trained by the preset object style image that not only retains the object characteristics of the natural object but also effectively incorporates the style characteristics of the target style. Style conversion, on the basis of effectively improving the stylization effect, can ensure the consistency between the object features in the style image of the first target object after style conversion and the object features of the first target object, thereby greatly improving the quality of the stylized image .

Fig. 10 is a flow chart of another image generation method according to an exemplary embodiment. As shown in Fig. 10, the image generation method is used in terminals and server electronic devices, and includes the following steps.

In step S1001, a second original object image and a target style label of a second target object are obtained;

In step S1003, input the second original object image and the target style label into the second style conversion network for style conversion processing, and obtain the second target object style image corresponding to the second target object;

In some embodiments, the second original object image may be an object image of the second target object uploaded by the user through the terminal. Taking the second target object as an example of a user's face, the second original object image may be a real face image of the user. The target style tag may be identification information of a certain style selected for the user.

In some embodiments, the style image of the second target object may be an object image of a style corresponding to the target style label of the second target object.

In some embodiments, the terminal may input the second original object image and the target style label into the second style conversion network for style conversion processing to obtain the second target object style image corresponding to the second target object. The terminal may also send the second original object image and the target style label to the server, and the server generates the second target object style image based on the second style conversion network and transmits it to the terminal.

In the above-mentioned embodiment, the second style conversion network obtained by training the preset object style image that not only retains the object characteristics of the natural object but also effectively incorporates the style characteristics of various target styles is used to convert the original object of the second target object The style conversion of the image corresponding to the style of the target style label can effectively improve the stylization effect, and ensure the consistency between the object features in the style image of the second target object after style conversion and the object features of the second target object, and then Greatly improved the quality of stylized images.

Fig. 11 is a block diagram of an image generating device according to an exemplary embodiment. Referring to Figure 11, the device includes:

The code acquisition module 1110 is configured to perform target style coding for acquiring preset object codes and target styles;

The first style fusion processing module 1120 is configured to perform style fusion processing on the target style code and the preset object code based on network fusion parameters corresponding to a preset number of network layers in the style fusion network to obtain the target style fusion code; network The fusion parameters are determined based on fusion data corresponding to a preset number of network layers and target fusion weights, and the target fusion weights are obtained by learning fusion weights based on target style coding and preset object coding;

The first image generation processing module 1130 is configured to input the target style fusion code into the target image generation network for image generation processing, and obtain a preset object style image corresponding to the target style.

In some embodiments, the first style fusion processing module 1120 includes:

The target fusion control data acquisition unit is configured to execute the acquisition of target fusion control data, and the target fusion control data is used to control the fusion position of the target style code and the preset object code in the style fusion network;

The fused data determination unit is configured to perform fusion control data according to the target, and determine fused data corresponding to a preset number of network layers;

The first fusion weight learning unit is configured to perform fusion weight learning based on target splicing coding to obtain target fusion weights;

The first weighting processing unit is configured to perform weighting processing on the fusion data and the target fusion weight to obtain network fusion parameters;

The first style fusion processing unit is configured to perform style fusion processing on the target style code and the preset object code in a preset number of network layers based on the network fusion parameters to obtain the target style fusion code.

In some embodiments, the fusion data determination unit includes:

The fused data determination subunit is configured to determine fused data corresponding to a preset number of network layers according to the comparison result.

In some embodiments, the code acquisition module 1110 includes:

a reference style image acquiring unit configured to acquire a reference style image of a target style;

In some embodiments, the above-mentioned device also includes:

The sample image acquisition module is configured to perform acquisition of a positive sample style image pair and a negative sample style image pair of the target style;

The style coding processing module is configured to perform style coding processing on inputting the positive sample style image pair and the negative sample style image pair into the style coding network to be trained, and obtain the respective sample style codes corresponding to the positive sample style image pair and the negative sample style image pair ;

The perceptual processing module is configured to input the sample style code into the perceptual network to be trained for perceptual processing, and obtain the respective sample perceptual feature information corresponding to the positive sample style image pair and the negative sample style image pair;

The comparison loss information determination module is configured to determine the comparison loss information according to the sample perception feature information;

The first network training module is configured to perform the training of the style encoding network to be trained and the perception network to be trained based on the comparison loss information;

The style coding network determination module is configured to execute the trained style coding network to be trained as the style coding network.

In some embodiments, the code acquisition module 1110 includes:

In some embodiments, the above-mentioned device also includes:

A sample data acquisition module configured to perform acquisition of a first sample style code of a target style, a sample object code, a second sample style code of a non-target style, a preset style object image, and a preset object image;

The second style fusion processing module is configured to perform style fusion processing on the first sample style code and the sample object code based on the sample network fusion parameters corresponding to a preset number of network layers to be trained in the style fusion network to be trained, and obtain Sample style fusion encoding;

The style discrimination processing module is configured to perform style discrimination processing by inputting the sample object style image, the preset style object image, the preset object image, the first sample style code and the second sample style code into the discriminant network to be trained to obtain the target discriminant information;

The target loss information determination module is configured to determine the target loss information according to the target discrimination information;

The second network training module is configured to perform training based on the target loss information to train the style fusion network to be trained, the image generation network to be trained and the discrimination network to be trained;

In some embodiments, the sample style fusion encoding includes a first style fusion encoding and a second style fusion encoding; the second style fusion processing module includes:

The sample fusion control data acquisition unit is configured to execute the acquisition of the first fusion control data and the second fusion control data, the first fusion control data is used to control the first sample style coding and sample object coding, from the style fusion network to be trained The first network layer starts to fuse; the second fusion control data is used to control the first sample style code not to participate in fusion in the style fusion network to be trained;

The sample fusion data determination unit is configured to determine the first sample fusion data and the second sample fusion data corresponding to the preset number of network layers to be trained according to the first fusion control data and the second fusion control data;

The second splicing processing unit is configured to perform splicing processing on the first sample style code and the sample object code to obtain the sample spliced code;

The second fused weight learning unit is configured to perform fused weight learning based on sample splicing coding to obtain sample fused weights;

The second weighting processing unit is configured to perform weighting processing on the first sample fusion data, the second sample fusion data and the sample fusion weight respectively, to obtain the first sample network fusion parameters and the second sample network fusion parameters;

The second style fusion processing unit is configured to perform the first sample style coding and sample object coding in a preset number of network layers to be trained based on the first sample network fusion parameters and the second sample network fusion parameters, respectively. Style fusion processing to obtain the first style fusion code and the second style fusion code.

In some embodiments, the sample object style image includes a first sample object style image corresponding to the first style fusion encoding and a second sample object style image corresponding to the second style fusion encoding; the discriminant network to be trained includes an object discriminant network, a style Object discrimination network and style coding discrimination network; target discrimination information includes object discrimination information, style object discrimination information and style coding discrimination information;

The style discrimination processing module includes:

The object discrimination processing unit is configured to perform object discrimination processing by inputting the second sample object style image and the preset object image into the object discrimination network to obtain object discrimination information;

a style object discrimination processing unit configured to input the first sample object style image and the preset style object image into the style object discrimination network for style object discrimination processing, and obtain style object discrimination information;

The style code discrimination processing unit is configured to input the second sample object style image, the first sample style code and the second sample style code into the style code discrimination network for style code discrimination processing to obtain style code discrimination information.

Fig. 12 is a block diagram of another image generating device according to an exemplary embodiment. Referring to Figure 12, the device includes:

an original object image acquisition module 1210, configured to perform acquisition of a first original object image of the first target object;

The first style conversion processing module 1220 is configured to input the first original object image into the first style conversion network for style conversion processing, and obtain the first target object style image corresponding to the first target object;

The first style conversion network is obtained by performing adversarial training on the first preset image generation network based on the first sample object image and the preset object style image of the target style generated by the above-mentioned image generation method.

Fig. 13 is a block diagram of another image generating device according to an exemplary embodiment. Referring to Figure 13, the device includes:

a data acquisition module 1310 configured to acquire a second original object image and a target style label of a second target object;

The second style conversion processing module 1320 is configured to perform style conversion processing by inputting the second original object image and the target style label into the second style conversion network to obtain a second target object style image corresponding to the second target object;

The second style conversion network is a preset object style image based on the second sample object image, multiple target style labels, and multiple target styles generated by the above image generation method, and the second preset target image generation network is subjected to adversarial training to obtain of.

Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

Fig. 14 is a block diagram of an electronic device for image generation according to an exemplary embodiment. The electronic device may be a terminal, and its internal structure may be as shown in Fig. 14 . The electronic device includes a processor, a memory, a network interface, a display screen and an input device connected through a system bus. Wherein, the processor of the electronic device is used to provide calculation and control capabilities. The memory of the electronic device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface of the electronic device is used to communicate with an external terminal through a network connection. When the computer program is executed by a processor, an image generation method is realized. The display screen of the electronic device may be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic device may be a touch layer covered on the display screen, or a button, a trackball or a touch pad provided on the housing of the electronic device , and can also be an external keyboard, touchpad, or mouse.

Fig. 15 is a block diagram of another electronic device for image generation according to an exemplary embodiment. The electronic device may be a server, and its internal structure may be as shown in Fig. 15 . The electronic device includes a processor, memory and network interface connected by a system bus. Wherein, the processor of the electronic device is used to provide calculation and control capabilities. The memory of the electronic device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface of the electronic device is used to communicate with an external terminal through a network connection. When the computer program is executed by a processor, an image generation method is realized.

Those skilled in the art can understand that the structure shown in FIG. 14 or FIG. 15 is only a block diagram of a part of the structure related to the disclosed solution, and does not constitute a limitation on the electronic equipment to which the disclosed solution is applied. In some In embodiments, the electronic device may include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

In an exemplary embodiment, there is also provided an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions, so as to realize the implementation of the present disclosure. The image generation method in the example.

In an exemplary embodiment, a computer-readable storage medium is also provided, and when instructions in the storage medium are executed by a processor of the electronic device, the electronic device can execute the image generation method in the embodiments of the present disclosure.

In an exemplary embodiment, there is also provided a computer program product including instructions, which, when run on a computer, cause the computer to execute the image generation method in the embodiments of the present disclosure.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be realized by instructing related hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage medium , when the computer program is executed, it may include the procedures of the embodiments of the above-mentioned methods. Wherein, any references to memory, storage, database or other media used in the various embodiments provided in the present application may include non-volatile and/or volatile memory. Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

All the embodiments of the present disclosure can be implemented independently or in combination with other embodiments, which are all regarded as the scope of protection required by the present disclosure.

Claims

A method for generating an image, comprising:

Obtain the target style encoding of the preset object encoding and target style;

Based on network fusion parameters corresponding to a preset number of network layers in the style fusion network, perform style fusion processing on the target style code and the preset object code to obtain a target style fusion code; the network fusion parameters are based on the The fusion data corresponding to a preset number of network layers and the target fusion weight are determined, and the target fusion weight is obtained by performing fusion weight learning based on the target style code and the preset object code;

The target style fusion code is input into the target image generation network for image generation processing, and the preset object style image corresponding to the target style is obtained.
The image generation method according to claim 1, wherein the network fusion parameters corresponding to a preset number of network layers in the style fusion network are used to perform style fusion on the target style code and the preset object code Processing to obtain the target style fusion encoding includes:

Acquiring target fusion control data, the target fusion control data is used to control the fusion position of the target style code and the preset object code in the style fusion network;

determining fusion data corresponding to the preset number of network layers according to the target fusion control data;

Perform splicing processing on the target style code and the preset object code to obtain the target spliced code;

performing fusion weight learning based on the target splicing code to obtain the target fusion weight;

performing weighting processing on the fusion data and the target fusion weight to obtain the network fusion parameters;

Perform style fusion processing on the target style code and the preset object code in the preset number of network layers based on the network fusion parameters to obtain the target style fusion code.
The image generation method according to claim 2, wherein said determining fusion data corresponding to said preset number of network layers according to said target fusion control data comprises:

Comparing the number of layers corresponding to the preset number of network layers with the target fusion control data to obtain a comparison result;

According to the comparison result, determine fusion data corresponding to the preset number of network layers.
The image generation method according to any one of claims 1 to 3, wherein said acquiring the target style encoding of the target style comprises:

Acquiring a reference style image of the target style;

Inputting the reference style image into the style encoding network for style encoding processing to obtain the target style encoding.
The image generation method according to claim 4, wherein the method further comprises:

Obtaining a positive sample style image pair and a negative sample style image pair of the target style;

Inputting the positive sample style image pair and the negative sample style image pair into the style coding network to be trained for style coding processing to obtain the sample style codes corresponding to the positive sample style image pair and the negative sample style image pair;

Inputting the sample style code into the perceptual network to be trained for perceptual processing, and obtaining the respective sample perceptual feature information corresponding to the positive sample style image pair and the negative sample style image pair;

determining comparative loss information according to the perceptual feature information of the sample;

training the style encoding network to be trained and the perception network to be trained based on the comparative loss information;

The trained style coding network to be trained is used as the style coding network.
The image generation method according to any one of claims 1 to 3, wherein said acquiring the target style encoding of the target style comprises:

Randomly generate an initial style code based on a first preset distribution;

Inputting the initial style code into the first multi-layer perceptual network for perceptual processing to obtain the target style code.
The image generation method according to any one of claims 1 to 3, wherein the preset object code includes obtaining by the following steps:

Randomly generating an initial object code based on a second preset distribution;

Inputting the initial object code into the second multi-layer perceptual network for perceptual processing to obtain the preset object code.
The image generation method according to any one of claims 1 to 3, wherein the method further comprises:

Acquiring a first sample style code of the target style, a sample object code, a second sample style code of a non-target style, a preset style object image, and a preset object image;

Based on the sample network fusion parameters corresponding to the preset number of network layers to be trained in the style fusion network to be trained, perform style fusion processing on the first sample style code and the sample object code to obtain a sample style fusion code;

Inputting the sample style fusion code into the image generation network to be trained for image generation processing to obtain the sample object style image corresponding to the target style;

Input the sample object style image, the preset style object image, the preset object image, the first sample style code and the second sample style code into the discriminant network to be trained for style discrimination processing, and obtain the target discrimination information;

determining target loss information according to the target discrimination information;

Training the style fusion network to be trained, the image generation network to be trained, and the discrimination network to be trained based on the target loss information;

The trained style fusion network to be trained is used as the style fusion network, and the trained image generation network to be trained is used as the target image generation network.
The image generation method according to claim 8, wherein the sample style fusion coding includes a first style fusion coding and a second style fusion coding; the preset number of networks to be trained based on the style fusion network to be trained The sample network fusion parameters corresponding to the layer, the style fusion processing is performed on the first sample style code and the sample object code, and the sample style fusion code obtained includes:

Acquiring first fusion control data and second fusion control data, the first fusion control data is used to control the first sample style code and the sample object code, from the first style fusion network to be trained The first network layer starts to fuse; the second fusion control data is used to control the first sample style code not to participate in fusion in the style fusion network to be trained;

According to the first fusion control data and the second fusion control data, respectively determine the first sample fusion data and the second sample fusion data corresponding to the preset number of network layers to be trained;

Concatenating the first sample style code and the sample object code to obtain a sample concatenation code;

Perform fusion weight learning based on the sample splicing code to obtain the sample fusion weight;

performing weighting processing on the first sample fusion data, the second sample fusion data and the sample fusion weight respectively, to obtain the first sample network fusion parameters and the second sample network fusion parameters;

Based on the first sample network fusion parameters and the second sample network fusion parameters, respectively perform style fusion processing on the first sample style code and the sample object code in the preset number of network layers to be trained , to obtain the first style fusion code and the second style fusion code.
The image generation method according to claim 9, wherein the sample object style image includes the first sample object style image corresponding to the first style fusion coding and the second style fusion coding corresponding to the second style image. A sample object style image; the discriminant network to be trained includes an object discriminant network, a style object discriminant network, and a style encoding discriminant network; the target discriminant information includes object discriminant information, style object discriminant information, and style code discriminant information;

The sample object style image, the preset style object image, the preset object image, the first sample style code and the second sample style code are input into the discriminant network to be trained for style discrimination processing, and the obtained Target discrimination information includes:

inputting the second sample object style image and the preset object image into the object discrimination network for object discrimination processing to obtain object discrimination information;

inputting the first sample object style image and the preset style object image into the style object discrimination network to perform style object discrimination processing to obtain style object discrimination information;

Inputting the second sample object style image, the first sample style code and the second sample style code into the style code discrimination network for style code discrimination processing to obtain style code discrimination information.
A method for generating an image, comprising:

acquiring a first original object image of a first target object;

inputting the first original object image into a first style conversion network to perform style conversion processing to obtain a first target object style image corresponding to the first target object;

The first style conversion network is a preset object style image based on the first sample object image and the target style generated by any image generation method according to claims 1 to 10, and the first preset image generation network is subjected to confrontation training to obtain of.
A method for generating an image, comprising:

Obtaining a second original object image and a target style label of a second target object;

inputting the second original object image and the target style label into a second style conversion network for style conversion processing, to obtain a second target object style image corresponding to the second target object;

The second style conversion network is a preset object style image based on the second sample object image, a variety of target style labels and a variety of target styles generated by any image generation method in claims 1 to 10, for the second preset The target image generation network is obtained by adversarial training.
An image generating device, characterized in that it comprises:

an encoding acquisition module configured to perform target style encoding for acquiring preset object encodings and target styles;

The first style fusion processing module is configured to perform style fusion processing on the target style code and the preset object code based on network fusion parameters corresponding to a preset number of network layers in the style fusion network to obtain a target style fusion Coding; the network fusion parameters are determined based on fusion data corresponding to the preset number of network layers and target fusion weights, and the target fusion weights are fusion weights based on the target style coding and the preset object coding learned;

The first image generation processing module is configured to input the target style fusion code into the target image generation network for image generation processing, and obtain a preset object style image corresponding to the target style.
The image generating device according to claim 13, wherein the first style fusion processing module comprises:

A target fusion control data acquisition unit configured to acquire target fusion control data, the target fusion control data is used to control the fusion position of the target style code and the preset object code in the style fusion network;

The fused data determination unit is configured to determine fused data corresponding to the preset number of network layers according to the target fused control data;

The first splicing processing unit is configured to perform splicing processing on the target style code and the preset object code to obtain the target spliced code;

The first fusion weight learning unit is configured to perform fusion weight learning based on the target splicing code to obtain the target fusion weight;

a first weighting processing unit configured to perform weighting processing on the fusion data and the target fusion weight to obtain the network fusion parameters;

The first style fusion processing unit is configured to perform style fusion processing on the target style code and the preset object code in the preset number of network layers based on the network fusion parameters to obtain the target style fusion encoding.
The image generating device according to claim 14, wherein the fusion data determining unit comprises:

The comparison unit is configured to compare the number of layers corresponding to the preset number of network layers with the target fusion control data to obtain a comparison result;

The fused data determining subunit is configured to determine fused data corresponding to the preset number of network layers according to the comparison result.
The image generating device according to any one of claims 13 to 15, wherein the code acquisition module includes:

a reference style image acquisition unit configured to acquire a reference style image of the target style;

The style encoding processing unit is configured to input the reference style image into the style encoding network for style encoding processing to obtain the target style encoding.
The image generating device according to claim 16, wherein the device further comprises:

The sample image acquisition module is configured to acquire the positive sample style image pair and the negative sample style image pair of the target style;

The style coding processing module is configured to perform style coding processing by inputting the positive sample style image pair and the negative sample style image pair into the style coding network to be trained, to obtain the positive sample style image pair and the negative sample style image pair The respective sample style codes;

The perceptual processing module is configured to input the sample style code into the perceptual network to be trained for perceptual processing, and obtain the sample perceptual feature information corresponding to the positive sample style image pair and the negative sample style image pair;

A comparison loss information determination module configured to determine the comparison loss information according to the sample perception feature information;

A first network training module configured to perform training of the style coding network to be trained and the perception network to be trained based on the comparison loss information;

The style coding network determining module is configured to execute the trained style coding network to be trained as the style coding network.
The image generating device according to any one of claims 13 to 15, wherein the code acquisition module includes:

an initial style code generating unit configured to randomly generate an initial style code based on a first preset distribution;

The first perceptual processing unit is configured to input the initial style code into the first multi-layer perceptual network for perceptual processing to obtain the target style code.
The image generating device according to any one of claims 13 to 15, wherein the code acquisition module includes:

The initial object code generating unit is configured to randomly generate the initial object code based on the second preset distribution;

The second perceptual processing unit is configured to input the initial object code into the second multi-layer perceptual network for perceptual processing to obtain the preset object code.
The image generating device according to any one of claims 13 to 15, wherein the device further comprises:

A sample data acquisition module configured to acquire the first sample style code of the target style, the sample object code, the second sample style code of the non-target style, the preset style object image and the preset object image;

The second style fusion processing module is configured to perform style encoding on the first sample style code and the sample object code based on sample network fusion parameters corresponding to a preset number of network layers to be trained in the style fusion network to be trained Fusion processing to obtain sample style fusion coding;

The second image generation processing module is configured to input the sample style fusion code into the image generation network to be trained for image generation processing, and obtain the sample object style image corresponding to the target style;

a style discrimination processing module configured to input the sample object style image, the preset style object image, the preset object image, the first sample style code and the second sample style code into the discriminator to be trained The network performs style discrimination processing to obtain target discrimination information;

A target loss information determination module configured to determine target loss information according to the target discrimination information;

The second network training module is configured to perform training of the style fusion network to be trained, the image generation network to be trained, and the discrimination network to be trained based on the target loss information;

The network determination module is configured to use the trained style fusion network to be trained as the style fusion network, and use the trained image generation network to be trained as the target image generation network.
The image generation device according to claim 20, wherein the sample style fusion coding includes a first style fusion coding and a second style fusion coding; and the second style fusion processing module includes:

The sample fusion control data acquisition unit is configured to perform acquisition of first fusion control data and second fusion control data, the first fusion control data is used to control the first sample style code and the sample object code, from The first network layer in the style fusion network to be trained starts to fuse; the second fusion control data is used to control the first sample style code not to participate in fusion in the style fusion network to be trained;

The sample fusion data determination unit is configured to determine the first sample fusion data and the second fusion data corresponding to the preset number of network layers to be trained according to the first fusion control data and the second fusion control data. Sample fusion data;

A second splicing processing unit configured to perform splicing processing on the first sample style code and the sample object code to obtain a sample spliced code;

The second fusion weight learning unit is configured to perform fusion weight learning based on the sample splicing code to obtain the sample fusion weight;

The second weighting processing unit is configured to perform weighting processing on the first sample fusion data, the second sample fusion data and the sample fusion weight respectively, to obtain the first sample network fusion parameters and the second sample network fusion parameter;

The second style fusion processing unit is configured to perform, based on the network fusion parameters of the first sample and the network fusion parameters of the second sample, the first sample style in the preset number of network layers to be trained respectively. performing style fusion processing on the coding and the sample object coding to obtain the first style fusion coding and the second style fusion coding.
The image generating device according to claim 21, wherein the sample object style image includes the first sample object style image corresponding to the first style fusion coding and the second style fusion coding corresponding to the second style image. A sample object style image; the discriminant network to be trained includes an object discriminant network, a style object discriminant network, and a style encoding discriminant network; the target discriminant information includes object discriminant information, style object discriminant information, and style code discriminant information;

The style discrimination processing module includes:

The object discrimination processing unit is configured to execute inputting the second sample object style image and the preset object image into the object discrimination network for object discrimination processing to obtain object discrimination information;

a style object discrimination processing unit configured to perform style object discrimination processing by inputting the first sample object style image and the preset style object image into the style object discrimination network to obtain style object discrimination information;

a style encoding discrimination processing unit configured to perform style encoding discrimination processing by inputting the second sample object style image, the first sample style code and the second sample style code into the style coding discrimination network to obtain Style encodes discriminative information.
An image generating device, characterized in that it comprises:

an original object image acquisition module configured to perform acquisition of a first original object image of the first target object;

The first style conversion processing module is configured to perform style conversion processing by inputting the first original object image into a first style conversion network to obtain a first target object style image corresponding to the first target object;

The first style conversion network is a preset object style image based on the first sample object image and the target style generated by any image generation method according to claims 1 to 10, and the first preset image generation network is subjected to confrontation training to obtain of.
An image generating device, characterized in that it comprises:

a data acquisition module configured to perform acquisition of a second original object image and a target style label of a second target object;

The second style conversion processing module is configured to input the second original object image and the target style label into a second style conversion network for style conversion processing, and obtain the second target object style corresponding to the second target object image;

The second style conversion network is a preset object style image based on the second sample object image, a variety of target style labels and a variety of target styles generated by any image generation method in claims 1 to 10, for the second preset The target image generation network is obtained by adversarial training.
An electronic device, characterized in that it comprises:

processor;

memory for storing said processor-executable instructions;

Wherein, the processor is configured to execute the instructions to perform the following steps:

Obtain the target style encoding of the preset object encoding and target style;

Based on network fusion parameters corresponding to a preset number of network layers in the style fusion network, perform style fusion processing on the target style code and the preset object code to obtain a target style fusion code; the network fusion parameters are based on the The fusion data corresponding to a preset number of network layers and the target fusion weight are determined, and the target fusion weight is obtained by performing fusion weight learning based on the target style code and the preset object code;

The target style fusion code is input into the target image generation network for image generation processing, and the preset object style image corresponding to the target style is obtained.
The electronic device according to claim 25, wherein the processor is further configured to perform the following steps:

Acquiring target fusion control data, the target fusion control data is used to control the fusion position of the target style code and the preset object code in the style fusion network;

determining fusion data corresponding to the preset number of network layers according to the target fusion control data;

Perform splicing processing on the target style code and the preset object code to obtain the target spliced code;

performing fusion weight learning based on the target splicing code to obtain the target fusion weight;

performing weighting processing on the fusion data and the target fusion weight to obtain the network fusion parameters;

Perform style fusion processing on the target style code and the preset object code in the preset number of network layers based on the network fusion parameters to obtain the target style fusion code.
The electronic device according to claim 26, wherein the processor is further configured to perform the following steps:

Comparing the number of layers corresponding to the preset number of network layers with the target fusion control data to obtain a comparison result;

According to the comparison result, determine fusion data corresponding to the preset number of network layers.
The electronic device according to any one of claims 25 to 27, wherein the processor is further configured to perform the following steps:

Acquiring a reference style image of the target style;

Inputting the reference style image into the style encoding network for style encoding processing to obtain the target style encoding.
The electronic device according to claim 28, wherein the processor is further configured to perform the following steps:

Obtaining a positive sample style image pair and a negative sample style image pair of the target style;

Inputting the positive sample style image pair and the negative sample style image pair into the style coding network to be trained for style coding processing to obtain the sample style codes corresponding to the positive sample style image pair and the negative sample style image pair;

Inputting the sample style code into the perceptual network to be trained for perceptual processing, and obtaining the respective sample perceptual feature information corresponding to the positive sample style image pair and the negative sample style image pair;

determining comparative loss information according to the perceptual feature information of the sample;

training the style encoding network to be trained and the perception network to be trained based on the comparative loss information;

The trained style coding network to be trained is used as the style coding network.
The electronic device method according to any one of claims 25 to 27, wherein the processor is further configured to perform the following steps:

Randomly generate an initial style code based on a first preset distribution;

Inputting the initial style code into the first multi-layer perceptual network for perceptual processing to obtain the target style code.
The electronic device according to any one of claims 25 to 27, wherein the processor is further configured to perform the following steps:

Randomly generating an initial object code based on a second preset distribution;

Inputting the initial object code into the second multi-layer perceptual network for perceptual processing to obtain the preset object code.
The electronic device according to any one of claims 25 to 27, wherein the processor is further configured to perform the following steps:

Acquiring the first sample style code of the target style, the sample object code, the second sample style code of the non-target style, the preset style object image and the preset object image;

Based on the sample network fusion parameters corresponding to the preset number of network layers to be trained in the style fusion network to be trained, perform style fusion processing on the first sample style code and the sample object code to obtain a sample style fusion code;

Inputting the sample style fusion code into the image generation network to be trained for image generation processing to obtain the sample object style image corresponding to the target style;

Input the sample object style image, the preset style object image, the preset object image, the first sample style code and the second sample style code into the discriminant network to be trained for style discrimination processing, and obtain the target discrimination information;

determining target loss information according to the target discrimination information;

Training the style fusion network to be trained, the image generation network to be trained, and the discrimination network to be trained based on the target loss information;

The trained style fusion network to be trained is used as the style fusion network, and the trained image generation network to be trained is used as the target image generation network.
The electronic device according to claim 32, wherein the style fusion coding of the sample comprises a first style fusion coding and a second style fusion coding; and the processor is further configured to perform the following steps:

Acquiring first fusion control data and second fusion control data, the first fusion control data is used to control the first sample style code and the sample object code, from the first style fusion network to be trained The first network layer starts to fuse; the second fusion control data is used to control the first sample style code not to participate in fusion in the style fusion network to be trained;

According to the first fusion control data and the second fusion control data, respectively determine the first sample fusion data and the second sample fusion data corresponding to the preset number of network layers to be trained;

Concatenating the first sample style code and the sample object code to obtain a sample concatenation code;

Perform fusion weight learning based on the sample splicing code to obtain the sample fusion weight;

performing weighting processing on the first sample fusion data, the second sample fusion data and the sample fusion weight respectively, to obtain the first sample network fusion parameters and the second sample network fusion parameters;

Based on the first sample network fusion parameters and the second sample network fusion parameters, respectively perform style fusion processing on the first sample style code and the sample object code in the preset number of network layers to be trained , to obtain the first style fusion code and the second style fusion code.
The electronic device according to claim 33, wherein the sample object style image includes a first sample object style image corresponding to the first style fusion coding and a second sample corresponding to the second style fusion coding An object style image; the discriminant network to be trained includes an object discriminant network, a style object discriminant network and a style code discriminant network; the target discriminant information includes object discriminant information, style object discriminant information and style code discriminant information;

The processor is also configured to perform the following steps:

inputting the second sample object style image and the preset object image into the object discrimination network for object discrimination processing to obtain object discrimination information;

inputting the first sample object style image and the preset style object image into the style object discrimination network to perform style object discrimination processing to obtain style object discrimination information;

Inputting the second sample object style image, the first sample style code and the second sample style code into the style code discrimination network for style code discrimination processing to obtain style code discrimination information.
An electronic device, characterized in that it comprises:

processor;

memory for storing said processor-executable instructions;

Wherein, the processor is configured to execute the instructions to perform the following steps:

acquiring a first original object image of a first target object;

inputting the first original object image into a first style conversion network to perform style conversion processing to obtain a first target object style image corresponding to the first target object;

The first style conversion network is a preset object style image based on the first sample object image and the target style generated by any image generation method according to claims 1 to 10, and the first preset image generation network is subjected to confrontation training to obtain of.
An electronic device, characterized in that it comprises:

processor;

memory for storing said processor-executable instructions;

Wherein, the processor is configured to execute the instructions to perform the following steps:

Obtaining a second original object image and a target style label of a second target object;

inputting the second original object image and the target style label into a second style conversion network for style conversion processing, to obtain a second target object style image corresponding to the second target object;

The second style conversion network is a preset object style image based on the second sample object image, a variety of target style labels and a variety of target styles generated by any image generation method in claims 1 to 10, for the second preset The target image generation network is obtained by adversarial training.
A computer-readable storage medium, characterized in that, in response to instructions in the storage medium being executed by a processor of an electronic device, the electronic device can execute the image as claimed in any one of claims 1 to 12 generate method.
A computer program product, comprising computer instructions, characterized in that, in response to the computer instructions being executed by a processor, the image generation method according to any one of claims 1 to 12 is implemented.