CN114202456A

CN114202456A - Image generation method, image generation device, electronic equipment and storage medium

Info

Publication number: CN114202456A
Application number: CN202111371705.0A
Authority: CN
Inventors: 刘明聪; 李强; 秦泽奎; 张国鑫; 万鹏飞; 郑文
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-11-18
Filing date: 2021-11-18
Publication date: 2022-03-18
Also published as: WO2023087656A1

Abstract

The disclosure relates to an image generation method, an image generation device, an electronic device and a storage medium, wherein the method comprises the steps of obtaining a preset object code and a target style code of a target style; performing style fusion processing on the target style code and the preset object code based on network fusion parameters corresponding to a preset number of network layers in the style fusion network to obtain a target style fusion code; the network fusion parameters are determined based on fusion data corresponding to a preset number of network layers and target fusion weights, and the target fusion weights are obtained by performing fusion weight learning based on target style codes and preset object codes; and inputting the target style fusion code into a target image generation network for image generation processing to obtain a preset object style image corresponding to the target style. By using the embodiment of the disclosure, the high-quality object style images can be rapidly generated, and the self-adaptive generation efficiency of the multi-style object style images is improved.

Description

Image generation method, image generation device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image generation method and apparatus, an electronic device, and a storage medium.

Background

With the continuous development of image processing technology, in the field of image application, the image style conversion function becomes a new interesting playing method. The image style conversion technology can generate an image with a target style based on a given object image such as a human face and the like, so that the image has an artistic effect similar to the target style, and the object image can be converted into object style images with different styles such as animation, oil painting, pencil painting and the like.

In the related art, a style conversion network having a style image generation function needs to be trained in advance in an image generation process for generating an object, but the training style conversion network often needs a large number of object style images as training images, and the quality of the training images needs to be ensured in order to ensure the consistency of the characteristics of the object style images generated by the style conversion network and the original object, and to avoid image distortion such as distortion. Therefore, how to quickly generate high-quality object style images which can be used for training the style conversion network becomes an urgent problem to be solved.

Disclosure of Invention

The present disclosure provides an image generation method, an image generation apparatus, an electronic device, and a storage medium, which can quickly generate a high-quality object-style image and improve adaptive generation efficiency of multi-style object-style images. The technical scheme of the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided an image generation method, including:

acquiring a preset object code and a target style code of a target style;

performing style fusion processing on the target style code and the preset object code based on network fusion parameters corresponding to a preset number of network layers in a style fusion network to obtain a target style fusion code; the network fusion parameters are determined based on fusion data corresponding to the preset number of network layers and target fusion weights, and the target fusion weights are obtained by performing fusion weight learning based on the target style codes and the preset object codes;

and inputting the target style fusion code into a target image generation network for image generation processing to obtain a preset object style image corresponding to the target style.

Optionally, the style fusion processing is performed on the target style code and the preset object code based on network fusion parameters corresponding to a preset number of network layers in the style fusion network, and obtaining the target style fusion code includes:

acquiring target fusion regulation and control data, wherein the target fusion regulation and control data are used for regulating and controlling the fusion position of the target style code and the preset object code in the style fusion network;

determining fusion data corresponding to the preset number of network layers according to the target fusion regulation and control data;

splicing the target style code and the preset object code to obtain a target splicing code;

performing fusion weight learning based on the target splicing codes to obtain the target fusion weight;

weighting the fusion data and the target fusion weight to obtain the network fusion parameters;

and performing style fusion processing on the target style code and the preset object code in the preset number of network layers based on the network fusion parameters to obtain the target style fusion code.

Optionally, the determining, according to the target fusion regulation and control data, the fusion data corresponding to the preset number of network layers includes:

comparing the number of layers corresponding to the preset number of network layers with the target fusion regulation and control data to obtain a comparison result;

and according to the comparison result, really fusing data corresponding to the preset number of network layers.

Optionally, the obtaining of the target style code of the target style includes:

acquiring a reference style image of the target style;

and inputting the reference style image into a style coding network for style coding processing to obtain the target style code.

Optionally, the method further includes:

acquiring a positive sample style image pair and a negative sample style image pair of the target style;

inputting the positive sample style image pair and the negative sample style image pair into a style coding network to be trained for style coding processing to obtain sample style codes corresponding to the positive sample style image pair and the negative sample style image pair respectively;

inputting the sample style codes into a perception network to be trained for perception processing to obtain sample perception characteristic information corresponding to the positive sample style image pair and the negative sample style image pair;

determining contrast loss information according to the sample perception characteristic information;

training the style coding network to be trained and the perception network to be trained based on the comparison loss information;

and taking the trained style coding network to be trained as the style coding network.

randomly generating an initial style code based on the first preset distribution;

and inputting the initial style code into a first multilayer perception network for perception processing to obtain the target style code.

Optionally, the preset object code includes the following steps:

randomly generating an initial object code based on the second preset distribution;

and inputting the initial object code into a second multilayer perception network for perception processing to obtain the preset object code.

Optionally, the method further includes:

acquiring a first sample style code, a sample object code, a second sample style code, a preset style object image and a preset object image of the target style;

based on sample network fusion parameters corresponding to a preset number of network layers to be trained in the style fusion network to be trained, style fusion processing is carried out on the first sample style code and the sample object code to obtain a sample style fusion code;

inputting the sample style fusion code into an image generation network to be trained for image generation processing to obtain a sample object style image corresponding to the target style;

inputting the sample object style image, the preset style object image, the preset object image, the first sample style code and the second sample style code into a to-be-trained discrimination network for style discrimination processing to obtain target discrimination information;

determining target loss information according to the target discrimination information;

training the style fusion network to be trained, the image generation network to be trained and the discrimination network to be trained based on the target loss information;

and taking the trained style fusion network to be trained as the style fusion network, and taking the trained image generation network to be trained as the target image generation network.

Optionally, the sample style fusion code includes a first style fusion code and a second style fusion code; the style fusion processing is performed on the first sample style code and the sample object code based on sample network fusion parameters corresponding to a preset number of network layers to be trained in the style fusion network to be trained, and obtaining the sample style fusion code comprises:

acquiring first fusion regulation and control data and second fusion regulation and control data, wherein the first fusion regulation and control data is used for regulating and controlling the first sample style code and the sample object code, and fusing from a first network layer in the to-be-trained style fusion network; the second fusion regulation and control data are used for regulating and controlling the first sample style code not to participate in fusion in the style fusion network to be trained;

respectively determining first sample fusion data and second sample fusion data corresponding to the preset number of network layers to be trained according to the first fusion regulation data and the second fusion regulation data;

splicing the first sample style code and the sample object code to obtain a sample splicing code;

performing fusion weight learning based on the sample splicing codes to obtain the sample fusion weight;

weighting the first sample fusion data and the second sample fusion data with the sample fusion weight respectively to obtain a first sample network fusion parameter and a second sample network fusion parameter;

based on the first sample network fusion parameter and the second sample network fusion parameter, respectively performing style fusion processing on the first sample style code and the sample object code in the preset number of network layers to be trained to obtain the first style fusion code and the second style fusion code.

Optionally, the sample object style images include a first sample object style image corresponding to the first style fusion code and a second sample object style image corresponding to the second style fusion code; the to-be-trained discrimination network comprises an object discrimination network, a style object discrimination network and a style code discrimination network; the target discrimination information comprises object discrimination information, style object discrimination information and style coding discrimination information;

the step of inputting the sample object style image, the preset style object image, the preset object image, the first sample style code and the second sample style code into a to-be-trained discrimination network for style discrimination processing to obtain target discrimination information includes:

inputting the second sample object style image and the preset object image into the object judgment network for object judgment processing to obtain object judgment information;

inputting the first style image of the sample object and the preset style image of the preset style object into the style object judging network to judge the style object to obtain style object judging information;

and inputting the second sample object style image, the first sample style code and the second sample style code into the style code judging network for style code judging processing to obtain style code judging information.

According to a second aspect of the embodiments of the present disclosure, there is provided an image generation method including:

acquiring a first original object image of a first target object;

inputting the first original object image into a first style conversion network for style conversion processing to obtain a first target object style image corresponding to the first target object;

the first style conversion network is obtained by performing countermeasure training on the first preset image generation network based on the first sample object image and the preset object style image of the target style generated by any image generation method provided by the first aspect.

According to a third aspect of the embodiments of the present disclosure, there is provided an image generation method including:

acquiring a second original object image and a target style label of a second target object;

inputting the second original object image and the target style label into a second style conversion network for style conversion processing to obtain a second target object style image corresponding to the second target object;

the second style conversion network is obtained by performing countermeasure training on a second preset target image generation network based on a second sample object image, multiple target style labels and multiple target style preset object style images generated by any image generation method provided by the first aspect.

According to a fourth aspect of the embodiments of the present disclosure, there is provided an image generation apparatus including:

the code acquisition module is configured to execute the steps of acquiring preset object codes and target style codes of a target style;

the first style fusion processing module is configured to execute network fusion parameters corresponding to a preset number of network layers in a style fusion network, and perform style fusion processing on the target style code and the preset object code to obtain a target style fusion code; the network fusion parameters are determined based on fusion data corresponding to the preset number of network layers and target fusion weights, and the target fusion weights are obtained by performing fusion weight learning based on the target style codes and the preset object codes;

and the first image generation processing module is configured to perform image generation processing on the target style fusion code input target image generation network to obtain a preset object style image corresponding to the target style.

Optionally, the first style fusion processing module includes:

a target fusion regulation and control data acquisition unit configured to perform acquisition of target fusion regulation and control data for regulating and controlling a fusion position of the target style code and the preset object code in the style fusion network;

the fusion data determining unit is configured to determine fusion data corresponding to the preset number of network layers according to the target fusion regulation and control data;

the first splicing processing unit is configured to perform splicing processing on the target style code and the preset object code to obtain a target splicing code;

a first fusion weight learning unit configured to perform fusion weight learning based on the target splicing code to obtain the target fusion weight;

the first weighting processing unit is configured to perform weighting processing on the fusion data and the target fusion weight to obtain the network fusion parameter;

and the first style fusion processing unit is configured to perform style fusion processing on the target style code and the preset object code in the preset number of network layers based on the network fusion parameters to obtain the target style fusion code.

Optionally, the fused data determining unit includes:

the comparison unit is configured to compare the number of layers corresponding to the preset number of network layers with the target fusion regulation and control data to obtain a comparison result;

and the fusion data determining subunit is configured to execute fusion data corresponding to the preset number of network layers according to the comparison result.

Optionally, the code acquiring module includes:

a reference style image acquisition unit configured to perform acquisition of a reference style image of the target style;

and the style coding processing unit is configured to input the reference style image into a style coding network for style coding processing to obtain the target style code.

Optionally, the apparatus further comprises:

a sample image acquisition module configured to perform acquisition of a positive sample style image pair and a negative sample style image pair of the target style;

the style coding processing module is configured to perform style coding processing on the positive sample style image pair and the negative sample style image pair which are input into a to-be-trained style coding network to obtain sample style codes corresponding to the positive sample style image pair and the negative sample style image pair respectively;

the perception processing module is configured to input the sample style codes into a perception network to be trained for perception processing to obtain sample perception characteristic information corresponding to the positive sample style image pair and the negative sample style image pair;

a contrast loss information determination module configured to perform determining contrast loss information according to the sample perceptual feature information;

a first network training module configured to perform training of the to-be-trained style-coded network and the to-be-trained perceptual network based on the contrast loss information;

and the style coding network determining module is configured to execute the trained style coding network to be trained as the style coding network.

Optionally, the code acquiring module includes:

an initial style code generation unit configured to perform random generation of an initial style code based on a first preset distribution;

and the first perception processing unit is configured to input the initial style code into a first multilayer perception network for perception processing to obtain the target style code.

Optionally, the code acquiring module includes:

an initial object code generation unit configured to perform random generation of an initial object code based on a second preset distribution;

and the second perception processing unit is configured to input the initial object code into a second multilayer perception network for perception processing to obtain the preset object code.

Optionally, the apparatus further comprises:

the sample data acquisition module is configured to execute acquisition of a first sample style code, a sample object code, a second sample style code, a preset style object image and a preset object image of the target style;

the second style fusion processing module is configured to execute style fusion processing on the first style code and the sample object code based on sample network fusion parameters corresponding to a preset number of network layers to be trained in a style fusion network to be trained to obtain a sample style fusion code;

the second image generation processing module is configured to perform image generation processing on the sample style fusion code input to an image generation network to be trained to obtain a sample object style image corresponding to the target style;

the style discrimination processing module is configured to input the sample object style image, the preset style object image, the preset object image, the first sample style code and the second sample style code into a discrimination network to be trained for style discrimination processing to obtain target discrimination information;

a target loss information determination module configured to perform determining target loss information according to the target discrimination information;

a second network training module configured to perform training of the to-be-trained style fusion network, the to-be-trained image generation network, and the to-be-trained discrimination network based on the target loss information;

and the network determining module is configured to execute the trained style fusion network to be trained as the style fusion network and the trained image generation network to be trained as the target image generation network.

Optionally, the sample style fusion code includes a first style fusion code and a second style fusion code; the second style fusion processing module comprises:

a sample fusion regulation and control data acquisition unit configured to perform acquisition of first fusion regulation and control data and second fusion regulation and control data, wherein the first fusion regulation and control data is used for regulating the first sample style code and the sample object code, and fusion is started from a first network layer in the style fusion network to be trained; the second fusion regulation and control data are used for regulating and controlling the first sample style code not to participate in fusion in the style fusion network to be trained;

the sample fusion data determining unit is configured to execute the determination of first sample fusion data and second sample fusion data corresponding to the preset number of network layers to be trained according to the first fusion regulation data and the second fusion regulation data;

the second splicing processing unit is configured to perform splicing processing on the first sample style code and the sample object code to obtain a sample splicing code;

a second fusion weight learning unit configured to perform fusion weight learning based on the sample stitching coding to obtain the sample fusion weight;

the second weighting processing unit is configured to perform weighting processing on the first sample fusion data and the second sample fusion data and the sample fusion weights respectively to obtain a first sample network fusion parameter and a second sample network fusion parameter;

and the second style fusion processing unit is configured to execute style fusion processing on the first style code and the sample object code in the preset number of network layers to be trained respectively based on the first sample network fusion parameter and the second sample network fusion parameter, so as to obtain the first style fusion code and the second style fusion code.

the style discrimination processing module comprises:

an object discrimination processing unit configured to perform object discrimination processing by inputting the second sample object style image and the preset object image into the object discrimination network to obtain object discrimination information;

a style object discrimination processing unit configured to input the first sample object style image and the preset style object image into the style object discrimination network to perform style object discrimination processing, so as to obtain style object discrimination information;

and the style code distinguishing processing unit is configured to input the second sample object style image, the first sample style code and the second sample style code into the style code distinguishing network for style code distinguishing processing to obtain style code distinguishing information.

According to a fifth aspect of the embodiments of the present disclosure, there is provided an image generation apparatus including:

an original object image acquisition module configured to perform acquisition of a first original object image of a first target object;

the first style conversion processing module is configured to input the first original object image into a first style conversion network for style conversion processing to obtain a first target object style image corresponding to the first target object;

According to a sixth aspect of the embodiments of the present disclosure, there is provided an image generation apparatus comprising:

a data acquisition module configured to perform acquisition of a second original object image and a target style label of a second target object;

the second style conversion processing module is configured to execute the second original object image and the target style label which are input into a second style conversion network for style conversion processing, so as to obtain a second target object style image corresponding to the second target object;

According to a seventh aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the method of any one of the first, second and third aspects described above.

According to an eighth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of the first, second and third aspects of the embodiments of the present disclosure.

According to a ninth aspect of the embodiments of the present disclosure, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of any one of the first, second and third aspects of the embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in the process of generating stylized images (preset object style images) of a certain type of objects, the stylized images are decoupled into an object code and a style code, style fusion processing is carried out on the target style code and the preset object code in a style fusion network by combining fusion data corresponding to target fusion weights and a preset number of network layers and determining network fusion parameters, and the target style code and the preset object code can be fused to obtain target style fusion codes which can represent the object characteristics of the certain type of objects and can be effectively fused into the target style, and the target fusion weights are subjected to fusion weight learning based on the target style codes and the preset object codes, so that the fusion weights of the objects under different target styles can be adjusted in a self-adaptive manner, and the object style codes under the target styles can be better fused. On the basis of greatly improving the stylized effect and the stylized image quality, the self-adaptive generation efficiency of the multi-style object style images can be greatly improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a schematic diagram illustrating an application environment in accordance with an illustrative embodiment;

FIG. 2 is a flow diagram illustrating an image generation method according to an exemplary embodiment;

FIG. 3 is a schematic network structure diagram of a style encoding network provided in accordance with an exemplary embodiment;

FIG. 4 is a flow diagram illustrating a pre-trained style-coding network in accordance with an exemplary embodiment;

fig. 5 is a flowchart illustrating a style fusion process performed on a target style code and a preset object code based on network fusion parameters corresponding to a preset number of network layers in a style fusion network to obtain a target style fusion code according to an exemplary embodiment;

FIG. 6 is a flow diagram illustrating a pre-trained target image generation network and style fusion network in accordance with an exemplary embodiment;

fig. 7 is a flowchart illustrating style fusion processing performed on a first sample style code and a sample object code based on sample network fusion parameters corresponding to a preset number of network layers to be trained in a style fusion network to be trained to obtain a sample style fusion code according to an exemplary embodiment;

FIG. 8 is a schematic diagram of a training style fusion network and a target image generation network provided in accordance with an exemplary embodiment;

FIG. 9 is a flow diagram illustrating an image generation method according to an exemplary embodiment;

FIG. 10 is a flow chart illustrating another method of image generation according to an exemplary embodiment;

FIG. 11 is a block diagram illustrating an image generation apparatus according to an exemplary embodiment;

FIG. 12 is a block diagram illustrating another image generation apparatus according to an exemplary embodiment;

FIG. 13 is a block diagram illustrating another image generation apparatus according to an exemplary embodiment;

FIG. 14 is a block diagram illustrating an electronic device for image generation in accordance with an exemplary embodiment;

FIG. 15 is a block diagram illustrating another electronic device for image generation in accordance with an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are information and data authorized by the user or sufficiently authorized by each party.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment according to an exemplary embodiment, and as shown in fig. 1, the application environment may include a terminal 100 and a server 200.

The terminal 100 can be used to provide a stylized image (object-style image) generating service of a target object to any user. Specifically, the terminal 100 may include, but is not limited to, a smart phone, a desktop computer, a tablet computer, a notebook computer, a smart speaker, a digital assistant, an Augmented Reality (AR)/Virtual Reality (VR) device, a smart wearable device, and other types of electronic devices, and may also be software running on the electronic devices, such as an application program. Optionally, the operating system running on the electronic device may include, but is not limited to, an android system, an IOS system, linux, windows, and the like.

In an alternative embodiment, server 200 may provide a background service for terminal 100, pre-generate object-style images for training the style conversion network, and training may be used to convert the object images into stylized images (object-style images of a target style). Specifically, the server 200 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content delivery network), a big data and artificial intelligence platform, and the like.

In addition, it should be noted that fig. 1 shows only one application environment provided by the present disclosure, and in practical applications, other application environments may also be included, for example, more terminals may be included.

In the embodiment of the present specification, the terminal 100 and the server 200 may be directly or indirectly connected through wired or wireless communication, and the disclosure is not limited herein.

Fig. 2 is a flowchart illustrating an image generation method according to an exemplary embodiment, which is used in an electronic device such as a terminal or a server, as shown in fig. 2, and includes the following steps.

In step S201, a preset object code and a target style code of a target style are acquired;

in a particular embodiment, the target style may be any image style. The image style division can be combined with the actual application requirements to have a plurality of division modes. Alternatively, the target styles may include, but are not limited to, animation, oil painting, pencil painting, and other image styles.

In an alternative embodiment, the target style encoding may be encoding information capable of characterizing style characteristics of the target style. The preset object code may be code information capable of characterizing the object characteristics of a certain class of objects. Alternatively, the object may include, but is not limited to, a human face, a cat face, a dog face, and the like, which require style conversion.

In an alternative embodiment, the target style may be an image style extracted from a reference style image, and accordingly, the obtaining of the target style target code may include:

acquiring a reference style image of a target style;

and inputting the reference style image into a style coding network for style coding processing to obtain a target style code.

In a particular embodiment, the stylistic image may be an image having a certain style. Taking the object as a face as an example, and the target style as an animation style as an example, the style image is an animation face image.

In a specific embodiment, the style coding network may be obtained by performing contrast training on the to-be-trained style coding network for the positive sample style image pair and the negative sample style image pair based on the target style.

In a specific embodiment, the network structure of the style-coded network may be preset in conjunction with the actual application. In a specific embodiment, as shown in fig. 3, fig. 3 is a schematic diagram of a network structure of a style-coded network according to an exemplary embodiment. Specifically, the style coding network may include a convolutional neural network, a style feature extraction network, a feature splicing network, and a multilayer perceptual network connected in sequence.

In a specific embodiment, the convolutional neural network can be used for extracting image characteristic information of the reference style image; the style characteristic extraction network can be used for extracting style characteristic information in the image characteristic information; the feature splicing network can be used for splicing the style feature information into a long vector with preset dimensionality (the preset dimensionality is consistent with the input dimensionality of the multilayer perception network); the multilayer perception network can comprise two parts of multilayer perception networks which are connected in sequence, the first part of multilayer perception network can be used for reducing dimensions and converting long vectors of preset dimensions into style codes with denser information, and therefore the image generation network can be trained more easily. The second part multilayer perception network can be used for reducing the dimension of the coded information output by the first part multilayer perception network and transforming the coded information from (coding space) to the distribution space of the image generation network.

Specifically, the specific network structures of the convolutional neural network, the style feature extraction network, the feature splicing network and the multilayer perception network can also be set in combination with practical application.

In an optional embodiment, the method may further include: the step of pre-training the style code network, as shown in fig. 4, may include the following steps:

in step S401, a positive sample style image pair and a negative sample style image pair of a target style are acquired;

in step S403, inputting the positive sample style image pair and the negative sample style image pair into a to-be-trained style coding network for style coding processing, so as to obtain a positive sample style image pair and a negative sample style image pair for respective corresponding sample style coding;

in step S405, the sample style codes are input into a perception network to be trained for perception processing, and sample perception feature information corresponding to a positive sample style image pair and a negative sample style image pair is obtained;

in step S407, determining contrast loss information according to the sample perceptual feature information;

in step S409, based on the comparison loss information, training the style coding network to be trained and the perceptual network to be trained;

in step S411, the trained style coding network to be trained is used as the style coding network.

In a specific embodiment, the network structure of the style-coding network to be trained is the same as the network structure of the style-coding network, but the network parameters are different.

In a specific embodiment, a reference style image of a target style can be obtained, and by performing affine transformation on the reference style image for multiple times, a pair of sample style image pairs is formed by the style image after each affine transformation and the reference style image; alternatively, the amount of translation for multiple affine changes may be different.

In a specific embodiment, a plurality of non-target style reference style images can be obtained, and the plurality of non-target style reference images and the target style reference style images are respectively combined into a plurality of pairs of target style negative sample style image pairs. Optionally, multiple pairs of negative sample style image pairs may also be formed by performing multiple affine transformations on a certain non-target style reference style image and combining the multiple affine transformations with the target style reference style image.

In a specific embodiment, in order to train the to-be-trained style coding network by using an auto-supervised learning strategy, a positive sample style image pair and a negative sample style image outputted by the to-be-trained style coding network are inputted to the to-be-trained sensing network for sensing, so as to obtain sample sensing feature information corresponding to the positive sample style image pair and the negative sample style image pair, and determine the contrast loss information according to the sample sensing feature information.

In a specific embodiment, a predetermined contrast loss function may be incorporated in the process of determining the contrast loss information according to the sample perceptual feature information, and in a specific embodiment, the predetermined contrast loss function may be any one of the contrast loss functions, for example, a NT-Xent contrast loss function (normalized temperature-scaled cross entropy loss function).

In a specific embodiment, the training the style-coding network to be trained and the perceptual network to be trained based on the contrast loss information may include: updating network parameters in the style coding network to be trained and the perception network to be trained based on the comparison loss information; then, repeatedly inputting the positive sample style image pair and the negative sample style image pair into the to-be-trained style coding network based on the updated to-be-trained style coding network and the to-be-trained perception network for style coding processing, obtaining the positive sample style image pair and the negative sample style image pair, coding the corresponding sample styles of the positive sample style image pair and the negative sample style image pair to training iteration operation based on the contrast loss information, and updating the network parameters in the to-be-trained style coding network and the to-be-trained perception network until a first preset convergence condition is reached.

In a specific embodiment, in the case that the first preset convergence condition is reached, the current style coding network to be trained (the trained style coding network to be trained) is used as the style coding network.

In an alternative embodiment, the reaching of the first preset convergence condition may be that the number of training iterations reaches a first preset number of training. Optionally, the reaching of the first preset convergence condition may also be that the contrast loss information is smaller than a first preset threshold. In this embodiment of the present description, the first preset training number and the first preset threshold may be preset in combination with a training speed and accuracy of the network in practical application.

In the embodiment, the target style-based positive sample style image pair and the target style-based negative sample style image pair are obtained by performing contrast training on the style coding network to be trained, so that the training of the self-supervision style coding network can be realized, and the representation accuracy of the trained style coding network on style characteristics can be effectively ensured. And the target style codes of the target style are extracted from the style images of the target style based on the trained style coding network, so that the accuracy of the target style codes on the representation of the target style can be effectively improved.

In an optional embodiment, the target style may also be an image style obtained by random sampling, and accordingly, the obtaining the target style of the target style includes:

and inputting the initial style code into a first multilayer perception network for perception processing to obtain a target style code.

In a specific embodiment, the first predetermined distribution may be a predetermined encoding distribution, and optionally, the first predetermined distribution may include, but is not limited to, a gaussian distribution.

In a specific embodiment, the first multi-layered perceptual network may be configured to perform dimension reduction on the initial-style encoding and transform the initial-style encoding from (an encoding space, e.g., a gaussian distribution space) to a distribution space in which the image generation network is located.

In the embodiment, the initial style code is randomly generated, so that the generation efficiency and diversity of the style code can be improved, and the flexibility of generating the object style image is greatly improved; and after the initial style codes are randomly generated based on the first preset distribution, the sensing processing is carried out by combining the first multilayer sensing network, so that the image generation network can be more easily trained.

In an optional embodiment, the preset object code includes the following steps:

and inputting the initial object code into a second multilayer perception network for perception processing to obtain a preset object code.

In a specific embodiment, the second predetermined distribution may be a predetermined encoding distribution, and optionally, the second predetermined distribution may include, but is not limited to, a gaussian distribution.

In a specific embodiment, the second multi-layered perceptual network may be used to reduce the dimension of the initial object code and transform the initial object code from (an encoding space, e.g., a gaussian distribution space) to a distribution space in which the image generation network is located.

In the embodiment, the generation efficiency of the object code can be improved by randomly generating the initial object code, so that a large number of object style images can be quickly generated for a certain style, the image generation efficiency of the certain style is effectively improved, and after the initial object code is randomly generated based on the second preset distribution, the sensing processing is performed by combining the second multilayer sensing network, so that the image generation network can be more easily trained.

In step S203, style fusion processing is performed on the target style code and the preset object code based on the network fusion parameters corresponding to the preset number of network layers in the style fusion network, so as to obtain the target style fusion code.

In an alternative embodiment, the style fusion network may be used to perform style fusion processing on the target style code and the preset object code. Specifically, the style fusion network may include a preset number of network layers; specifically, the number of network layers (preset number) may be set in combination with the actual application.

In an optional embodiment, the style fusion network may be a style fusion network capable of adjusting a fusion degree and adaptively adjusting a fusion weight. Optionally, the fusion degree may be controlled by adjusting and controlling the fusion position of the target style code and the preset object code in the style fusion network (i.e., fusing from that network layer). Optionally, fusion weight learning may be performed on the target style codes and the preset object codes, so as to adaptively adjust the fusion weight of the object in different target styles.

In a specific embodiment, the network fusion parameter may be determined based on fusion data corresponding to a preset number of network layers and a target fusion weight, where the target fusion weight is obtained by performing fusion weight learning based on a target style code and a preset object code.

In an optional embodiment, as shown in fig. 5, the performing style fusion processing on the target style code and the preset object code based on the network fusion parameters corresponding to the preset number of network layers in the style fusion network to obtain the target style fusion code may include the following steps:

in step S501, target fusion regulation and control data is acquired.

In a specific embodiment, the target fusion regulation and control data is used for regulating and controlling a fusion position of the target style code and the preset object code in the style fusion network.

In step S503, determining fusion data corresponding to a preset number of network layers according to the target fusion regulation and control data;

in an optional embodiment, the determining, according to the target fusion regulation and control data, the fusion data corresponding to a preset number of network layers may include:

and according to the comparison result, really presetting fusion data corresponding to a plurality of network layers.

In a specific embodiment, the fusion data corresponding to each network layer may characterize whether the target style code participates in the fusion at the network layer. Specifically, the preset number of network layers included in the style fusion network may be a preset number of network layers arranged in sequence, for example, the 0 th layer to the nth layer; specifically, under the condition that the comparison result indicates that the number of layers of the network layer is smaller than the target fusion regulation and control data, the fusion data corresponding to the network layer is 0, that is, the target style code does not participate in the fusion in the network layer; otherwise, when the comparison result indicates that the number of layers of the network layer is greater than or equal to the target fusion regulation and control data, the fusion data corresponding to the network layer is 1, that is, the target style code participates in the fusion at the network layer.

In the above embodiment, whether the target style code participates in the fusion in the multiple network layers is controlled based on the fusion data determined by the target fusion regulation and control data, so that the adjustable style feature fusion can be realized, and then it can be ensured that the subsequent generation of the object style images with higher similarity to the natural object and different stylization degrees, the flexible regulation and control of the stylization strength of the object are realized, the requirements of different scenes are better met, and the image quality of the generated object style images and the generation efficiency of the object style images are greatly improved.

In step S505, the target style code and the preset object code are spliced to obtain a target splicing code.

In step S507, fusion weight learning is performed based on the target splicing code to obtain a target fusion weight.

In an alternative embodiment, the target splicing code may be input into the fully-connected layer to perform fusion weight learning, so as to obtain the target fusion weight. Specifically, the target fusion weight may be used to control a fusion ratio of the target style code and the preset object code in the fusion city.

In the above embodiment, fusion weight learning is performed by combining target splicing codes including target style codes and preset object codes, so that the learned target fusion weight is adaptive to different types of objects and styles, and further, object images in the target style are better fused.

In step S509, the fusion data and the target fusion weight are weighted to obtain a network fusion parameter.

In a specific embodiment, the target fusion weight and the fusion data are both data in a matrix form, and accordingly, the target fusion weight and corresponding elements in the fusion data may be multiplied to obtain the network fusion parameter.

In step S511, style fusion processing is performed on the target style code and the preset object code in a preset number of network layers based on the network fusion parameters, so as to obtain a target style fusion code.

In a specific embodiment, the network fusion parameters corresponding to the preset number of network layers may be used as weights of the target style codes, a matrix difference obtained by subtracting the network fusion parameters from a matrix with all one elements is used as a weight of the preset object codes, and the target style codes and the preset object codes are subjected to weighted summation based on the weight of the target style codes and the weight of the preset object codes to obtain the target fusion codes.

In a specific embodiment, it is assumed that the network fusion layer includes 18 network layers, specifically, when the target fusion regulation and control data is 0, the target style code and the preset object code are fused from the layer 1 network layer, and correspondingly, the subsequent image generation network outputs a completely stylized object style image; under the condition that the target fusion regulation and control data is 18, the target style code does not participate in fusion; correspondingly, the subsequent image generation network outputs the natural object image which is not stylized; under the condition that the target fusion regulation and control data is larger than 1 and smaller than 18, the target style code and the preset object code are fused from the ith (target fusion regulation and control data) layer, and the low-resolution layers (namely the 1 st layer to the (i-1) th layer) are not influenced by the target style code; based on the target fusion regulation and control data, the target style fusion coding can be guaranteed to keep the same feature information as the natural object and have style features of different degrees, and then follow-up generation of the object style images which are high in similarity to the natural object and have different stylization degrees can be guaranteed, and flexible regulation and control of stylization strength of the object can be achieved. Correspondingly, the fusion position of the target fusion regulation and control data for regulating and controlling the target style code and the preset object code in the style fusion network may be a starting fusion position of the regulating and control target style code and the preset object code in the style fusion network.

In the above embodiment, in the style fusion network, the target style code and the preset object code are controlled by combining the target fusion regulation and control data to the fusion position of the target style code and the preset object code in a plurality of network layers, so that the regulation and control of the fusion degree can be realized. And combining target splicing codes comprising target style codes and preset object codes to perform fusion weight learning, so that the learned target fusion weight is adaptive to different types of objects and styles, the fusion weight of the objects under different target styles is adaptively adjusted, the target style codes under the target styles are better fused, the obtained target style fusion codes can be guaranteed to have adjustable style characteristics and adaptive fusion weight on the basis of keeping the same characteristic information as natural objects, the subsequent generation of the target style images with higher similarity to the natural objects and different stylization degrees can be guaranteed, the flexible regulation and control of stylization strength of the objects can be realized, the adaptive generation efficiency of the multi-style object style images can be improved, the requirements of different scenes can be better met, the image quality of the generated target style images and the generation efficiency of the multi-style object style images can be greatly improved .

In step S205, the target style fusion code is input into the target image generation network to perform image generation processing, so as to obtain a preset object style image corresponding to the target style.

In a specific embodiment, the target image generation network may be configured to generate a preset object style image corresponding to the target style. In an optional embodiment, the method further includes: specifically, as shown in fig. 6, the step of pre-training the target image generation network and the style fusion network may include:

in step S601, a first sample style code of a target style, a sample object code, a second sample style code of a non-target style, a preset style object image, and a preset object image are acquired.

In a specific embodiment, the obtaining manner of the first sample style code and the second sample style code may refer to the obtaining manner of the target style code, and is not described herein again. The obtaining method of the sample object code may refer to the obtaining method of the preset object code, and is not described herein again.

In a specific embodiment, the preset style object image may be obtained from a stylized object image training set corresponding to a target style, and taking an object as a face as an example, a preset style face image may be obtained from a collected face style image training set of the target style. Specifically, the preset object image may be an original object image, and for example, the object is a human face, and may be an image of a certain real human face.

In step S603, style fusion processing is performed on the first sample style code and the sample object code based on sample network fusion parameters corresponding to a preset number of network layers to be trained in the style fusion network to be trained, so as to obtain a sample style fusion code.

In an optional embodiment, the to-be-trained style fusion network may include a preset number of to-be-trained network layers; the sample network fusion parameters can be determined based on sample fusion data and sample fusion weights corresponding to a preset number of network layers to be trained, and the sample fusion weights are obtained by performing fusion weight learning based on first sample style codes and sample object codes.

In an optional embodiment, the sample style fusion code includes a first style fusion code and a second style fusion code; correspondingly, as shown in fig. 7, the performing style fusion processing on the first sample style code and the sample object code based on the sample network fusion parameters corresponding to the preset number of network layers to be trained in the style fusion network to be trained to obtain the sample style fusion code may include the following steps:

in step S701, first fusion regulation and control data and second fusion regulation and control data are acquired.

In a specific embodiment, the first fusion regulation and control data is used for regulating and controlling the first sample style code and the sample object code, the fusion is started from a first network layer in the to-be-trained style fusion network, and in the embodiment combining the target fusion regulation and control data, the first fusion regulation and control data is 1; specifically, the second fusion regulation and control data may be used to regulate that the first sample style code does not participate in the fusion in the style fusion network to be trained, and the second fusion regulation and control data is 18 in the above embodiment in which the preset number is 18.

In step S703, respectively determining first sample fusion data and second sample fusion data corresponding to a preset number of network layers to be trained according to the first fusion regulation data and the second fusion regulation data;

in a specific embodiment, according to the first fusion regulation and control data, determining first sample fusion data corresponding to a preset number of network layers to be trained; and determining second sample fusion data corresponding to the preset number of network layers to be trained according to the second fusion regulation and control data.

In step S705, the first sample style code and the sample object code are spliced to obtain a sample splicing code;

in step S707, fusion weight learning is performed based on the sample splicing codes to obtain a sample fusion weight;

in step S709, weighting the first sample fusion data and the second sample fusion data with the sample fusion weight, respectively, to obtain a first sample network fusion parameter and a second sample network fusion parameter;

in step S711, based on the first sample network fusion parameter and the second sample network fusion parameter, style fusion processing is performed on the first sample style code and the sample object code in a preset number of network layers to be trained, respectively, so as to obtain a first style fusion code and a second style fusion code.

In a specific embodiment, specific details of the steps S705 to 711 may refer to those of the steps S505 to 511, and are not described herein again.

In the above embodiment, in the network training process, by combining the first fusion regulation and control data and the second fusion regulation and control data, style fusion processing can be performed on the sample object codes and the first sample style codes to different degrees, and flexible regulation and control of the object stylization strength is performed, so that the image quality of subsequently generated object style images and the generation efficiency of the object style images are improved.

In step S605, the sample style fusion code is input to the image generation network to be trained to perform image generation processing, so as to obtain a sample object style image corresponding to the target style;

in an alternative embodiment, the sample object style images may include a first sample object style image corresponding to a first style fusion code and a second sample object style image corresponding to a second style fusion code; correspondingly, the step of inputting the sample style fusion code into the to-be-trained image generation network for image generation processing to obtain the sample object style image corresponding to the target style comprises: and inputting the first style fusion code and the second style fusion code into an image generation network to be trained for image generation processing to obtain a first sample object style image and a second sample object style image.

In step S607, the sample object style image, the preset style object image, the preset object image, the first sample style code and the second sample style code are input to a discrimination network to be trained for style discrimination processing, so as to obtain target discrimination information;

in a specific embodiment, the to-be-trained discrimination network includes an object discrimination network, a style object discrimination network, and a style code discrimination network; correspondingly, the target discrimination information may include object discrimination information, style object discrimination information, and style code discrimination information;

in an optional embodiment, in a case that the style fusion network to be trained is a network with a fixed fusion structure, the inputting the sample object style image, the preset style object image, the preset object image, the first sample style code and the second sample style code into the judgment network to be trained for style judgment processing to obtain the target judgment information includes: inputting the sample object style image and the preset object image into an object distinguishing network for object distinguishing processing to obtain object distinguishing information (the object distinguishing information can comprise characteristic information corresponding to the sample object style image and the preset object image output by the object distinguishing network); inputting the sample object style image and the preset style object image into a style object distinguishing network to carry out style object distinguishing processing to obtain style object distinguishing information (the style object distinguishing information can comprise characteristic information corresponding to the sample object style image and the preset style object image output by the style object distinguishing network); the style coding discrimination network is input with the sample object style image, the first sample style code and the second sample style code to perform style coding discrimination processing to obtain style coding discrimination information (the style coding discrimination information may include feature information corresponding to the first sample style code and the second sample style code output by the style coding discrimination network).

In another optional embodiment, in a case that the style fusion network to be trained is a network capable of adjusting and controlling a fusion degree, the inputting the sample object style image, the preset style object image, the preset object image, the first sample style code and the second sample style code into the discrimination network to be trained to perform style discrimination processing to obtain the target discrimination information may include: inputting the second sample object style image and the preset object image into an object judgment network for object judgment processing to obtain object judgment information (the object judgment information can comprise characteristic information corresponding to the second sample object style image and the preset object image output by the object judgment network); inputting the first sample object style image and the preset style object image into a style object discrimination network to perform style object discrimination processing to obtain style object discrimination information (the style object discrimination information may include characteristic information corresponding to the first sample object style image and the preset style object image output by the style object discrimination network); and inputting the second sample object style image, the first sample style code and the second sample style code into a style code judging network for style code judging processing to obtain style code judging information (the style code judging information can comprise characteristic information corresponding to the first sample style code and the second sample style code output by the style code judging network).

In the embodiment, the confrontation training is performed on the to-be-trained image generation network for generating the object style image from three dimensions of style discrimination, style object discrimination and style code discrimination, so that the representation capability of the trained image generation network on the object stylized image can be greatly improved, and the quality of the generated object style image is improved.

In step S609, target loss information is determined according to the target discrimination information;

in a specific embodiment, the target loss information may include generation loss information corresponding to an image generation network to be trained and discrimination loss information corresponding to a discrimination network to be trained.

In a specific embodiment, a penalty function may be incorporated in determining the target loss information based on the target discrimination information. Specifically, the object discrimination loss between the feature information corresponding to the second sample object style image and the preset object image output by the object discrimination network can be determined by combining the countermeasure loss function; determining style object discrimination loss between the characteristic information corresponding to the first style image of the style object and the preset style image of the style object output by the style object discrimination network by combining a resistance loss function; and determining style coding discrimination loss between the feature information corresponding to the first sample style code and the second sample style code output by the style coding discrimination network by combining the confrontation loss function.

Further, the generated loss information may be obtained by adding an object discrimination loss, a style object discrimination loss, and a style coding discrimination loss, and the negative of the generated loss information may be used as the discrimination loss information.

In step S611, based on the target loss information, the style fusion network to be trained, the image generation network to be trained, and the discrimination network to be trained are trained.

In a specific embodiment, the training of the to-be-trained style fusion network, the to-be-trained image generation network, and the to-be-trained discrimination network based on the target loss information may include: updating network parameters in an image generation network to be trained and a style fusion network to be trained based on the generation loss information, and updating network parameters in a discrimination network to be trained (an object discrimination network, a style object discrimination network and a style coding discrimination network) based on the discrimination loss information; then, based on the updated to-be-trained style fusion network, the to-be-trained image generation network and the to-be-trained discrimination network, style fusion processing is repeatedly performed on the first sample style code and the sample object code based on the sample network fusion parameters corresponding to the preset number of to-be-trained network layers in the to-be-trained style fusion network to obtain the sample style fusion code, until the network parameters in the to-be-trained image generation network and the to-be-trained style fusion network are updated based on the generation loss information, and training iteration operation of updating the network parameters in the to-be-trained discrimination network (the object discrimination network, the style object discrimination network and the style code discrimination network) based on the discrimination loss information is performed until a second preset convergence condition is reached.

In an alternative embodiment, the reaching of the second preset convergence condition may be that the number of training iterations reaches a second preset number of training. Optionally, the second preset convergence condition is reached, and the generated loss information is smaller than a second preset threshold. In this embodiment of the present description, the second preset training number and the second preset threshold may be preset in combination with a training speed and accuracy of the network in practical application.

In step S613, the trained style fusion network to be trained is used as the style fusion network, and the trained image generation network to be trained is used as the target image generation network.

In a specific embodiment, when the second preset convergence condition is reached, the current style fusion network to be trained (the trained style fusion network to be trained) is used as the style fusion network, and the current image generation network to be trained (the trained image generation network to be trained) is used as the target image generation network.

In a specific embodiment, as shown in fig. 8, fig. 8 is a schematic diagram of a training style fusion network and a target image generation network provided according to an exemplary embodiment.

In the embodiment, the target image generation network and the style fusion network are jointly trained, so that the style features and the object features can be fused, the target image generation network is trained based on the fused sample object style images, the representation capability of the trained image generation network on the object styles and the object features can be greatly improved, and the quality of the subsequently generated object style images is effectively improved.

In a specific embodiment, based on the image generation method provided in the embodiment of the present disclosure, a large number of preset object-style images in a target style may be generated, and accordingly, based on the first sample object image (object images of a large number of real objects) and the image generation method, a large number of preset object-style images in the target style may be generated, and a countermeasure training may be performed on the first preset image generation network, so as to obtain a first style conversion network. The first style conversion network may be used to generate a target style object image of a target object. Specifically, in the process of performing countermeasure training on the first preset image generation network based on the first sample object image and the preset object style image of the target style generated by the image generation method, the first sample object image may be input into the first preset image generation network for style conversion processing, so as to obtain an object style image corresponding to the first sample object image; inputting the object style image and the corresponding preset object style image into a corresponding judgment network for style judgment processing to obtain first style judgment information; determining corresponding discrimination loss information based on the first style discrimination information; and then training a first preset image generation network and a corresponding discrimination network based on the discrimination loss information, and taking the trained first preset image generation network as a first style conversion network.

In a specific embodiment, the image generation method provided in the embodiments of the present disclosure may generate preset object-style images of multiple target styles, and accordingly, may generate preset object-style images of multiple target styles based on multiple target style labels of a second sample object image (object images of a large number of real objects) and the image generation method, and perform countermeasure training on the second preset image generation network, so as to obtain a second style conversion network. The second style conversion network may be used to generate object style images of a plurality of target styles. Specifically, in the process of performing countermeasure training on the second preset image generation network based on the second sample object image (object images of a large number of real objects) with multiple target style labels and the image generation method to generate preset object style images of multiple target styles, the second sample object image and the corresponding target style label may be input into the second preset image generation network for style conversion processing, so as to obtain an object style image corresponding to the second sample object image and matching with the target style label; inputting the object style image and the corresponding preset object style image into a corresponding judgment network for style judgment processing to obtain second style judgment information; in addition, a judgment network can be added to judge whether the style of the object style image output by the second preset image generation network corresponds to the target style label or not so as to obtain third style judgment information; determining corresponding loss information based on the second style discrimination information and the third style discrimination information; and then, a second preset image generation network and a corresponding discrimination network can be trained based on the loss information, and the trained second preset image generation network is used as a second style conversion network.

As can be seen from the technical solutions provided by the embodiments of the present specification, in the process of generating a stylized image (preset object style image) of a certain type of object, the present specification decouples the stylized image into two parts, namely object coding and style coding, and determines network fusion parameters in the style fusion network by combining the target fusion weight and fusion data corresponding to a preset number of network layers, style fusion processing is carried out on the target style code and the preset object code, and the target style fusion code which can not only represent the object characteristics of a certain type of object but also effectively fuse the target style can be obtained through the fusion of the target style code and the preset object code, and the target fusion weight is based on target style coding and preset object coding for fusion weight learning, the method can realize the self-adaptive adjustment of the fusion weight of the object under different target styles, and further better fuse the object style codes under the target styles. On the basis of greatly improving the stylized effect and the stylized image quality, the self-adaptive generation efficiency of the multi-style object style images can be greatly improved.

Fig. 9 is a flowchart illustrating another image generation method, as shown in fig. 9, for use in a terminal, server electronic device, according to an example embodiment, including the following steps.

In step S901, a first original object image of a first target object is acquired;

in step S903, inputting the first original object image into a first style conversion network to perform style conversion processing, so as to obtain a first target object style image corresponding to the first target object;

in a specific embodiment, the first original object image may be an object image of the first target object uploaded by the user through the terminal. Taking the first target object as the face of a certain user as an example, the first original object image may be a real face image of the user.

In a particular embodiment, the first target object style image may be a target style object image of the first target object.

Optionally, the terminal may input the first original object image into a first style conversion network to perform style conversion processing, so as to obtain a first target object style image corresponding to the first target object. The terminal can also send the first original object image to the server, and the server generates a first target object style image based on the first style conversion network and transmits the first target object style image to the terminal.

In the above embodiment, the style conversion is performed on the original object image of the first target object by combining the first style conversion network obtained by training the preset object style image which retains the object features of the natural object and effectively incorporates the style features of the target style, so that the consistency between the object features in the style-converted first target object style image and the object features of the first target object can be ensured on the basis of effectively improving the stylized effect, and the quality of the stylized image is further greatly improved.

Fig. 10 is a flowchart illustrating another image generation method, as shown in fig. 10, for use in a terminal, server electronic device, according to an example embodiment, including the following steps.

In step S1001, a second original object image and a target style label of a second target object are acquired;

in step S1003, inputting the second original object image and the target style label into a second style conversion network to perform style conversion processing, so as to obtain a second target object style image corresponding to the second target object;

in a specific embodiment, the second original object image may be an object image of a second target object uploaded by a user through the terminal. Taking the second target object as the face of a certain user as an example, the second original object image may be a real face image of the user. The target style label may be identifying information of a certain style selected for the user.

In a specific embodiment, the second target object style image may be an object image of a style corresponding to the target style label of the second target object.

Optionally, the terminal may input the second original object image and the target style label into a second style conversion network to perform style conversion processing, so as to obtain a second target object style image corresponding to the second target object. The terminal can also send the second original object image and the target object style label to the server, and the server generates a second target object style image based on the second style conversion network and transmits the second target object style image to the terminal.

In the above embodiment, style conversion of the style corresponding to the target style label is performed on the original object image of the second target object by combining the second style conversion network obtained by training the preset object style image which retains the object features of the natural object and effectively integrates the style features of multiple target styles, so that on the basis of effectively improving the stylized effect, consistency between the object features in the style-converted second target object style image and the object features of the second target object can be ensured, and further, the quality of the stylized image is greatly improved.

FIG. 11 is a block diagram illustrating an image generation apparatus according to an exemplary embodiment. Referring to fig. 11, the apparatus includes:

a code acquiring module 1110 configured to perform acquiring a preset object code and a target style code of a target style;

a first style fusion processing module 1120, configured to execute a style fusion processing on the target style code and the preset object code based on network fusion parameters corresponding to a preset number of network layers in the style fusion network, so as to obtain a target style fusion code; the network fusion parameters are determined based on fusion data corresponding to a preset number of network layers and target fusion weights, and the target fusion weights are obtained by performing fusion weight learning based on target style codes and preset object codes;

and the first image generation processing module 1130 is configured to perform image generation processing on the target style fusion code input target image generation network to obtain a preset object style image corresponding to the target style.

Optionally, the first style fusion processing module 1120 includes:

the target fusion regulation and control data acquisition unit is configured to acquire target fusion regulation and control data, and the target fusion regulation and control data are used for regulating and controlling the fusion position of the target style code and the preset object code in the style fusion network;

the fusion data determining unit is configured to determine fusion data corresponding to a preset number of network layers according to the target fusion regulation and control data;

the first splicing processing unit is configured to perform splicing processing on the target style code and a preset object code to obtain a target splicing code;

the first fusion weight learning unit is configured to execute fusion weight learning based on the target splicing codes to obtain target fusion weights;

the first weighting processing unit is configured to perform weighting processing on the fusion data and the target fusion weight to obtain a network fusion parameter;

and the first style fusion processing unit is configured to execute style fusion processing on the target style code and the preset object code in a preset number of network layers based on the network fusion parameters to obtain a target style fusion code.

Optionally, the fused data determining unit includes:

and the fusion data determining subunit is configured to execute fusion data corresponding to the network layers of which the number is really preset according to the comparison result.

Optionally, the code acquiring module 1110 includes:

a reference style image acquisition unit configured to perform acquisition of a reference style image of a target style;

and the style coding processing unit is configured to input the reference style image into a style coding network for style coding processing to obtain a target style code.

Optionally, the apparatus further comprises:

a sample image acquisition module configured to perform acquiring a positive sample style image pair and a negative sample style image pair of a target style;

the style coding processing module is configured to perform style coding processing on the positive sample style image pair and the negative sample style image pair which are input into a to-be-trained style coding network to obtain respective corresponding sample style codes of the positive sample style image pair and the negative sample style image pair;

a first network training module configured to perform training of a to-be-trained style encoding network and a to-be-trained perception network based on the comparison loss information;

Optionally, the code acquiring module 1110 includes:

and the first perception processing unit is configured to input the initial style code into a first multilayer perception network for perception processing to obtain a target style code.

Optionally, the code acquiring module 1110 includes:

and the second perception processing unit is configured to input the initial object code into a second multilayer perception network for perception processing to obtain a preset object code.

Optionally, the apparatus further comprises:

the system comprises a sample data acquisition module, a target data acquisition module and a target data acquisition module, wherein the sample data acquisition module is configured to execute the steps of acquiring a first sample style code of a target style, a sample object code, a second sample style code of a non-target style, a preset style object image and a preset object image;

the second style fusion processing module is configured to execute style fusion processing on the first style code and the sample object code based on sample network fusion parameters corresponding to a preset number of network layers to be trained in the style fusion network to be trained to obtain a sample style fusion code;

the second image generation processing module is configured to execute the step of inputting the sample style fusion code into an image generation network to be trained to perform image generation processing so as to obtain a sample object style image corresponding to a target style;

the second network training module is configured to train the style fusion network to be trained, the image generation network to be trained and the discrimination network to be trained on the basis of the target loss information;

and the network determining module is configured to execute the trained style fusion network to be trained as a style fusion network and the trained image generation network to be trained as a target image generation network.

the system comprises a sample fusion regulation and control data acquisition unit, a first network layer fusion control unit and a second network layer fusion control unit, wherein the sample fusion regulation and control data acquisition unit is configured to acquire first fusion regulation and control data and second fusion regulation and control data, the first fusion regulation and control data is used for regulating and controlling first sample style codes and sample object codes, and fusion starts from the first network layer in a style fusion network to be trained; the second fusion regulation and control data are used for regulating and controlling the first sample style code not to participate in fusion in the style fusion network to be trained;

the sample fusion data determining unit is configured to execute second fusion regulation and control data according to the first fusion regulation and control data, and respectively determine first sample fusion data and second sample fusion data corresponding to a preset number of network layers to be trained;

the second fusion weight learning unit is configured to execute fusion weight learning based on the sample splicing codes to obtain sample fusion weights;

the second weighting processing unit is configured to perform weighting processing on the first sample fusion data and the second sample fusion data and the sample fusion weight respectively to obtain a first sample network fusion parameter and a second sample network fusion parameter;

and the second style fusion processing unit is configured to execute style fusion processing on the first style code and the sample object code in a preset number of network layers to be trained respectively based on the first sample network fusion parameter and the second sample network fusion parameter to obtain a first style fusion code and a second style fusion code.

the style discrimination processing module comprises:

the object distinguishing processing unit is configured to input the second sample object style image and the preset object image into an object distinguishing network for object distinguishing processing to obtain object distinguishing information;

the style object distinguishing processing unit is configured to input the first style image of the sample object and the preset style image of the sample object into a style object distinguishing network for carrying out style object distinguishing processing to obtain style object distinguishing information;

and the style code distinguishing processing unit is configured to execute style code distinguishing processing of inputting the second sample object style image, the first sample style code and the second sample style code into a style code distinguishing network to obtain style code distinguishing information.

FIG. 12 is a block diagram illustrating another image generation apparatus according to an example embodiment. Referring to fig. 12, the apparatus includes:

an original object image acquisition module 1210 configured to perform acquiring a first original object image of a first target object;

a first style conversion processing module 1220, configured to perform style conversion processing on the first original object image input into a first style conversion network, so as to obtain a first target object style image corresponding to the first target object;

the first style conversion network is obtained by performing countermeasure training on the first preset image generation network based on the first sample object image and the preset object style image of the target style generated by the image generation method.

FIG. 13 is a block diagram illustrating another image generation apparatus according to an exemplary embodiment. Referring to fig. 13, the apparatus includes:

a data acquisition module 1310 configured to perform acquiring a second original object image and a target style label of a second target object;

a second style conversion processing module 1320, configured to perform style conversion processing by inputting the second original object image and the target style label into a second style conversion network, so as to obtain a second target object style image corresponding to the second target object;

the second style conversion network is obtained by performing countermeasure training on the second preset target image generation network based on the second sample object image, the multiple target style labels and the preset object style images of the multiple target styles generated by the image generation method.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 14 is a block diagram illustrating an electronic device for image generation, which may be a terminal, according to an exemplary embodiment, and an internal structure thereof may be as shown in fig. 14. The electronic device comprises a processor, a memory, a network interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement an image generation method. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.

Fig. 15 is a block diagram illustrating another electronic device for image generation, which may be a server, according to an example embodiment, and an internal structure thereof may be as shown in fig. 15. The electronic device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement an image generation method.

It will be understood by those skilled in the art that the configurations shown in fig. 14 or fig. 15 are only block diagrams of some configurations relevant to the present disclosure, and do not constitute a limitation on the electronic device to which the present disclosure is applied, and a particular electronic device may include more or less components than those shown in the drawings, or combine some components, or have a different arrangement of components.

In an exemplary embodiment, there is also provided an electronic device including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the image generation method as in the embodiments of the present disclosure.

In an exemplary embodiment, there is also provided a computer-readable storage medium, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform an image generation method in embodiments of the present disclosure.

In an exemplary embodiment, a computer program product containing instructions is also provided, which when run on a computer, causes the computer to perform the image generation method in the embodiments of the present disclosure.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image generation method, comprising:

acquiring a preset object code and a target style code of a target style;

2. The image generation method according to claim 1, wherein the style fusion processing of the target style code and the preset object code based on network fusion parameters corresponding to a preset number of network layers in a style fusion network to obtain a target style fusion code comprises:

3. An image generation method, comprising:

acquiring a first original object image of a first target object;

the first style conversion network is obtained by performing countermeasure training on the first preset image generation network based on the first sample object image and the preset object style image of the target style generated by the image generation method according to any one of claims 1 to 2.

4. An image generation method, comprising:

the second style conversion network is obtained by performing countermeasure training on a second preset target image generation network based on a second sample object image, a plurality of target style labels and a plurality of target style preset object style images generated by the image generation method according to any one of claims 1 to 2.

5. An image generation apparatus, comprising:

6. An image generation apparatus, comprising:

7. An image generation apparatus, comprising:

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image generation method of any of claims 1 to 4.

9. A computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the image generation method of any of claims 1 to 4.

10. A computer program product comprising computer instructions, characterized in that the computer instructions, when executed by a processor, implement the image generation method of any of claims 1 to 4.