CN107516290B

CN107516290B - Image conversion network acquisition method and device, computing equipment and storage medium

Info

Publication number: CN107516290B
Application number: CN201710574583.2A
Authority: CN
Inventors: 申发龙; 颜水成; 曾钢; 程斌
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2017-07-14
Filing date: 2017-07-14
Publication date: 2021-03-19
Anticipated expiration: 2037-07-14
Also published as: CN107516290A

Abstract

The invention discloses an image conversion network acquisition method, an image conversion network acquisition device, a computing device and a computer storage medium, wherein the image conversion network acquisition method is executed based on a trained first network, and the method comprises the following steps: acquiring a first image and a second image; and respectively inputting the first image and the second image into a first network, and performing weighted operation on a weighted operation layer of the first network according to a preset fusion weight value to obtain a second network corresponding to the fused style of the first image and the second image. According to the technical scheme provided by the invention, the trained first network can be used for quickly obtaining the image conversion network corresponding to the fused style of the two style images, so that the efficiency of obtaining the image conversion network is effectively improved, and the processing mode of the image conversion network is optimized.

Description

Image conversion network acquisition method and device, computing equipment and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to an image conversion network acquisition method, an image conversion network acquisition device, a computing device and a computer storage medium.

Background

By utilizing the image stylization processing technology, the style on the style image can be transferred to the daily shot image, so that the image can obtain better visual effect. In the prior art, a given style image is directly input into a neural network (neural network), then a large number of content images are used as sample images, an image conversion network corresponding to the given style image is obtained through a plurality of iterative training, and then the style conversion of the input content image is realized by using the image conversion network. This requires a long training time, resulting in inefficient acquisition of an image conversion network, and it is also difficult to obtain an image conversion network corresponding to the fused style of two style images using the prior art.

Disclosure of Invention

In view of the above, the present invention has been made to provide an image conversion network acquisition method, apparatus, computing device, and computer storage medium that overcome or at least partially solve the above-mentioned problems.

According to an aspect of the present invention, there is provided an image conversion network acquisition method performed based on a trained first network, the method including:

acquiring a first image and a second image;

and respectively inputting the first image and the second image into a first network, and performing weighted operation on a weighted operation layer of the first network according to a preset fusion weight value to obtain a second network corresponding to the fused style of the first image and the second image.

Further, the step of inputting the first image and the second image into the first network respectively, and performing a weighted operation on a weighted operation layer of the first network according to a preset fusion weight, so as to obtain a second network corresponding to the fused style of the first image and the second image further includes:

inputting a first image into a first network, performing forward propagation operation once in the first network, and determining weighted operation layer data corresponding to the first image;

inputting the second image into the first network, and performing forward propagation operation once in the first network to determine weighted operation layer data corresponding to the second image;

and according to the preset fusion weight, carrying out weighted operation on the weighted operation layer data corresponding to the first image and the weighted operation layer data corresponding to the second image in the weighted operation layer of the first network to obtain a second network corresponding to the fused style of the first image and the second image.

Further, the weighting operation layer is a bottleneck layer; the bottleneck layer is the layer with the smallest vector dimension among the convolutional layers of the first network.

Further, the weighted operation layer data is vector data.

Further, the sample image used for the first network training comprises: a plurality of first sample images stored by the genre image library and a plurality of second sample images stored by the content image library.

Further, the training process of the first network is completed through a plurality of iterations; in an iteration process, a first sample image is extracted from the style image library, at least one second sample image is extracted from the content image library, and the first network training is realized by utilizing the first sample image and the at least one second sample image.

Further, in the process of multiple iterations, a first sample image is fixedly extracted, and at least one second sample image is alternatively extracted; and after the second sample image in the content image library is extracted, replacing the next first sample image and then extracting at least one second sample image.

Further, the training process of the first network is completed through a plurality of iterations; wherein, the one-time iteration process comprises the following steps:

generating a third sample image corresponding to the second sample image using a third network corresponding to the style of the first sample image;

and obtaining a first network loss function according to the style loss between the third sample image and the first sample image and the content loss between the third sample image and the second sample image, and realizing the training of the first network by using the first network loss function.

Further, the training step of the first network comprises:

extracting a first sample image from the style image library, and extracting at least one second sample image from the content image library;

inputting the first sample image into the first network to obtain a third network corresponding to the style of the first sample image;

generating corresponding third sample images respectively aiming at least one second sample image by utilizing a third network corresponding to the style of the first sample image;

obtaining a first network loss function according to the style loss between at least one third sample image and the first sample image and the content loss between at least one third sample image and the corresponding second sample image, and updating the weight parameter of the first network according to the first network loss function;

the training step of the first network is iteratively performed until a predetermined convergence condition is met.

Further, the predetermined convergence condition includes: the iteration times reach the preset iteration times; and/or the output value of the first network loss function is smaller than a preset threshold value; and/or the visual effect parameter of the third sample image corresponding to the second sample image reaches the preset visual effect parameter.

Further, inputting the first sample image into the first network, and obtaining a third network corresponding to the style of the first sample image further includes:

extracting style texture features from the first sample image;

and inputting the style texture features into the first network to obtain a third network corresponding to the style texture features.

Further, the first network is a meta-network obtained by training the neural network, and the second network is an image conversion network.

Further, the third network is a transformed network obtained during the training of the first network.

Further, the method is performed by the terminal.

According to another aspect of the present invention, there is provided an image stylized fusion processing method, including:

and performing stylization processing on the third image to be processed by using the second network obtained by the image conversion network obtaining method to obtain a fourth image corresponding to the third image.

According to another aspect of the present invention, there is provided an image stylization intensity adjustment method, including:

and performing stylization processing on any one of the first image and the second image by using the second network obtained by the image conversion network acquisition method to obtain a corresponding fifth image.

According to another aspect of the present invention, there is provided an image conversion network acquisition apparatus that operates based on a trained first network, the apparatus including:

an acquisition module adapted to acquire a first image and a second image;

and the mapping module is suitable for respectively inputting the first image and the second image into a first network, and performing weighted operation on a weighted operation layer of the first network according to a preset fusion weight value to obtain a second network corresponding to the fused style of the first image and the second image.

Further, the mapping module is further adapted to:

Further, the weighted operation layer data is vector data.

Further, the apparatus further comprises: a first network training module; the training process of the first network is completed through multiple iterations;

the first network training module is adapted to: in an iteration process, a first sample image is extracted from the style image library, at least one second sample image is extracted from the content image library, and the first network training is realized by utilizing the first sample image and the at least one second sample image.

Further, the first network training module is further adapted to:

fixedly extracting a first sample image, and alternatively extracting at least one second sample image; and after the second sample image in the content image library is extracted, replacing the next first sample image and then extracting at least one second sample image.

the first network training module is adapted to: generating a third sample image corresponding to the second sample image by using a third network corresponding to the style of the first sample image in an iteration process;

Further, the apparatus further comprises: a first network training module;

the first network training module comprises:

the extraction unit is suitable for extracting a first sample image from the style image library and extracting at least one second sample image from the content image library;

the generating unit is suitable for inputting the first sample image into the first network to obtain a third network corresponding to the style of the first sample image;

a processing unit adapted to generate corresponding third sample images for the at least one second sample image, respectively, using a third network corresponding to the style of the first sample image;

the updating unit is suitable for obtaining a first network loss function according to the style loss between at least one third sample image and the first sample image and the content loss between at least one third sample image and the corresponding second sample image, and updating the weight parameter of the first network according to the first network loss function;

the first network training module is operated iteratively until a predetermined convergence condition is met.

Further, the generation unit is further adapted to:

extracting style texture features from the first sample image;

According to another aspect of the present invention, there is provided a terminal including the image conversion network acquisition apparatus described above.

According to another aspect of the present invention, there is provided an image stylized fusion processing apparatus comprising:

and the first processing module is suitable for performing stylization processing on the third image to be processed by using the second network obtained by the image conversion network obtaining device to obtain a fourth image corresponding to the third image.

According to another aspect of the present invention, there is provided an image stylization intensity adjustment apparatus, including:

and the second processing module is suitable for performing stylization processing on any one of the first image and the second image by using the second network obtained by the image conversion network obtaining device to obtain a corresponding fifth image.

According to another aspect of the present invention, there is provided a computing device comprising: the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the image conversion network acquisition method.

According to another aspect of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the image conversion network acquisition method.

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the image stylized fusion processing method.

According to another aspect of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the method for stylized fusion processing of images as described above.

According to yet another aspect of the present invention, there is provided a computing device comprising: the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the image stylization intensity adjusting method.

According to still another aspect of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the image stylization intensity adjustment method.

According to the technical scheme provided by the invention, a first image and a second image are obtained, then the first image and the second image are respectively input into a first network, and weighted operation is carried out on a weighted operation layer of the first network according to a preset fusion weight value, so that a second network corresponding to the fused style of the first image and the second image is obtained. According to the technical scheme provided by the invention, the trained first network can be used for quickly obtaining the image conversion network corresponding to the fused style of the two style images, so that the efficiency of obtaining the image conversion network is effectively improved, and the processing mode of the image conversion network is optimized.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flow chart illustrating an image transformation network acquisition method according to an embodiment of the present invention;

FIG. 2 shows a flow diagram of a network training method according to one embodiment of the invention;

FIG. 3 illustrates an exemplary set of graphs of the results of stylizing a content image using an image conversion network corresponding to the style of a given stylized image, as derived by the present invention;

FIG. 4 is a flowchart illustrating an image transformation network acquisition method according to another embodiment of the present invention;

FIG. 5a is a flow diagram illustrating a method of stylized fusion processing of an image, according to one embodiment of the invention;

FIG. 5b illustrates an exemplary set of graphs of the results of stylizing a third image using a second network corresponding to the fused style of the first and second images, as derived by the present invention;

FIG. 6a shows a schematic flow diagram of an image stylization intensity adjustment method according to an embodiment of the invention;

FIG. 6b illustrates an exemplary set of graphs of the results of stylizing a second image using a second network corresponding to the fused style of the first and second images, as derived by the present invention;

fig. 7 is a block diagram showing the configuration of an image conversion network acquisition apparatus according to an embodiment of the present invention;

fig. 8 is a block diagram showing a connection configuration of an image conversion network acquisition apparatus and an image stylized fusion processing apparatus according to another embodiment of the present invention;

fig. 9 is a block diagram showing a connection configuration of an image conversion network acquisition apparatus and an image stylization intensity adjustment apparatus according to another embodiment of the present invention;

FIG. 10 shows a schematic structural diagram of a computing device according to an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a flowchart illustrating an image transformation network acquisition method according to an embodiment of the present invention, the method being performed by a terminal, the method being performed based on a trained first network, as shown in fig. 1, and the method including the steps of:

step S100, a first image and a second image are acquired.

The first image and the second image are two stylized images having different styles, and for the sake of convenience of distinction, the two stylized images are referred to as the first image and the second image, respectively. Specifically, the first image and the second image may be stylized images having any style, and are not limited to stylized images having some specific style.

And S101, respectively inputting the first image and the second image into a first network, and performing weighted operation on a weighted operation layer of the first network according to a preset fusion weight value to obtain a second network corresponding to the fused style of the first image and the second image.

In an embodiment of the present invention, the first network is a meta network (meta network) obtained by training a neural network, and the second network is an image conversion network. The first network is trained, and specifically, a sample image used for training the first network includes: a plurality of first sample images stored by the genre image library and a plurality of second sample images stored by the content image library. The first sample image is a style sample image, and the second sample image is a content sample image. The trained first network is well suited for use with arbitrary style images and arbitrary content images, then, after the first image is inputted into the first network, the image conversion network corresponding to the style of the first image can be mapped quickly, and similarly, after the second image is inputted into the first network, the image conversion network corresponding to the style of the second image can also be mapped quickly, so that in order to obtain the image conversion network corresponding to the fused style of the first image and the second image, in step S101, the acquired first image and the acquired second image may be respectively input into a first network, and according to a preset fusion weight, performing a weighting operation in a weighting operation layer of the first network, and substituting the weighting operation result into the first network, the second network corresponding to the fused style of the first image and the second image can be obtained quickly.

The preset fusion weight can be set by those skilled in the art according to actual needs, and is not limited herein. In an embodiment of the present invention, the preset fusion weight may include a preset fusion weight corresponding to the first image and a preset fusion weight corresponding to the second image. Assuming that the preset fusion weight corresponding to the first image is 0.8 and the preset fusion weight corresponding to the second image is 0.2, the fused style of the first image and the second image is a style fused with 80% of the first image and 20% of the second image.

Specifically, the training process of the first network is completed through a plurality of iterations. Optionally, in an iterative process, a first sample image is extracted from the style image library, at least one second sample image is extracted from the content image library, and the training of the first network is implemented by using the first sample image and the at least one second sample image.

Optionally, the one-iteration process comprises: generating a third sample image corresponding to the second sample image using a third network corresponding to the style of the first sample image; and obtaining a first network loss function according to the style loss between the third sample image and the first sample image and the content loss between the third sample image and the second sample image, and realizing the training of the first network by using the first network loss function.

Wherein the third network is a switching network obtained in the training process of the first network. Specifically, the second network and the third network are both image conversion networks in nature, but the difference between the two is that: the second network is an image conversion network corresponding to the fused style of two style images obtained in practical application, and the third network is an image conversion network corresponding to the style of one style image obtained in the training process of the first network.

According to the image conversion network acquisition method provided by the embodiment of the invention, the first image and the second image are acquired, then the first image and the second image are respectively input into the first network, and the weighted operation is carried out on the weighted operation layer of the first network according to the preset fusion weight value, so that the second network corresponding to the fused style of the first image and the second image is obtained. According to the technical scheme provided by the invention, the trained first network can be used for quickly obtaining the image conversion network corresponding to the fused style of the two style images, so that the efficiency of obtaining the image conversion network is effectively improved, and the processing mode of the image conversion network is optimized.

Fig. 2 is a flowchart illustrating a network training method according to an embodiment of the present invention, and as shown in fig. 2, the training step of the first network includes the following steps:

step S200, a first sample image is extracted from the style image library, and at least one second sample image is extracted from the content image library.

In a specific training process, the style image library stores 10 ten thousand first sample images, and the content image library stores 10 ten thousand second sample images, wherein the first sample images are style images, and the second sample images are content images. In step S200, a first sample image is extracted from the genre image library, and at least a second sample image is extracted from the content image library. The number of the second sample images can be set by those skilled in the art according to actual needs, and is not limited herein.

Step S201, inputting the first sample image into the first network, and obtaining a third network corresponding to the style of the first sample image.

In one embodiment of the present invention, the first network is a meta-network obtained by training a neural network. For example, the neural network may be a VGG-16 convolutional neural network (convolutional neural network). Specifically, in step S201, style texture features are extracted from the first sample image, and then the extracted style texture features are input into the first network, and forward propagation (forward propagation) operation is performed in the first network, so as to obtain a third network corresponding to the style texture features.

Step S202 is to generate a third sample image corresponding to each of the at least one second sample image using a third network corresponding to the style of the first sample image.

After the third network corresponding to the style of the first sample image is obtained, corresponding third sample images can be generated for at least one second sample image respectively by using the third network corresponding to the style of the first sample image, wherein the third sample images are style transition images corresponding to the second sample images, and the style transition images have the style consistent with the first sample images. When 8 second sample images are extracted in step S200, corresponding third sample images are generated for the 8 second sample images, respectively, i.e., one corresponding third sample image is generated for each second sample image in step S202.

Step S203, obtaining a first network loss function according to the style loss between the at least one third sample image and the first sample image and the content loss between the at least one third sample image and the corresponding second sample image, and updating the weight parameter of the first network according to the first network loss function.

Wherein, those skilled in the art can set the specific content of the first network loss function according to actual needs, and the content is not limited herein. In one embodiment, the first network loss function may be:

wherein, I_cFor the second sample image, I_sI is the first sample image, I is the third sample image, CP is the perceptual function for perceiving the content difference, SP is the perceptual function for perceiving the style difference,

for a loss of content between the third sample image and the corresponding second sample image,

is the loss of style between the third sample image and the first sample image, theta is the weight parameter of the first network, and lambda_cFor presetting content loss weight, λ_sWeight is lost for the default style. According to the first network loss function, a back propagation (back propagation) operation is performed, and the weight parameter θ of the first network is updated according to the operation result.

In a specific training process, the first network is a meta-network obtained by training a neural network, and the third network is a transformed network obtained in the training process of the first network. The first network is trained using a stochastic gradient descent (stochastic gradient device) algorithm. The specific training process comprises:

1. setting a number of iterations k of a first sample image and a second sample image I_cThe number of (2). For example, the number of iterations k may be set to 20, and the second sample image I may be set_cThe number m of (2) is set to 8, which indicates training in the meta networkIn the process, 20 iterations are needed for one first sample image, and each iteration needs to extract 8 second sample images I from the content image library_c。

2. Fixedly extracting a first sample image I from a style image library_s。

3. A first sample image I_sInputting the image into a first network N (-) and performing feed-forward propagation (feed-forward propagation) operation in the first network N (-) to obtain the image I_sCorresponding to the style of (d) of the third network w. The mapping formula of the third network w and the first network N (·; theta) is as follows: w ← N (I)_s；θ)。

4. Inputting m second sample images I_c. Wherein m second sample images I_cCan be used

And (4) showing.

5. Using a third network w, for each second sample image I separately_cA corresponding third sample image I is generated.

6. The weight parameter theta of the first network is updated according to the first network loss function.

The first network loss function is specifically:

in the first network loss function, λ_cFor presetting content loss weight, λ_sWeight is lost for the default style.

And step S204, iteratively executing the training step of the first network until a preset convergence condition is met.

Wherein, those skilled in the art can set the predetermined convergence condition according to the actual requirement, and the present disclosure is not limited herein. For example, the predetermined convergence condition may include: the iteration times reach the preset iteration times; and/or the output value of the first network loss function is smaller than a preset threshold value; and/or the visual effect parameter of the third sample image corresponding to the second sample image reaches the preset visual effect parameter. Specifically, whether the predetermined convergence condition is satisfied may be determined by determining whether the iteration number reaches a preset iteration number, whether the predetermined convergence condition is satisfied may be determined according to whether an output value of the first network loss function is smaller than a preset threshold value, and whether the predetermined convergence condition is satisfied may be determined by determining whether a visual effect parameter of a third sample image corresponding to the second sample image reaches a preset visual effect parameter. In step S204, the training step of the first network is iteratively performed until a predetermined convergence condition is satisfied, thereby obtaining a trained first network.

It is worth noting that in order to improve the stability of the first network in the training process, in the multiple iteration process, a first sample image is fixedly extracted, and at least one second sample image is alternatively extracted; and after the second sample image in the content image library is extracted, replacing the next first sample image and then extracting at least one second sample image.

By fixing the first sample image and continuously replacing the second sample image, the first network suitable for the first sample image and any second sample image can be efficiently trained, and then the next first sample image is replaced and the second sample image is continuously replaced, so that the first network suitable for the two first sample images and any second sample image is trained. The process is repeated until the first sample image in the style image library and the second sample image in the content image library are extracted, so that the first network suitable for any first sample image and any second sample image can be obtained through training, which is equivalent to the first network suitable for any style image and any content image obtained through training, the time required for training the first network is effectively shortened, and the training efficiency of the first network is improved.

Because the trained first network can be well applied to any style images and any content images, the image conversion network corresponding to the style of the given style images can be quickly mapped by using the first network instead of directly training by using a neural network, and therefore, compared with the prior art, the speed of obtaining the image conversion network is greatly improved. In addition, by utilizing the first network and carrying out weighted operation on the weighted operation layer of the first network according to the preset fusion weight, the image conversion network corresponding to the fused style of the two style images can be quickly obtained.

Fig. 3 is a diagram showing an exemplary set of results of stylizing a content image using an image conversion network corresponding to the style of a given stylized image obtained by the present invention, and as shown in fig. 3, the image in the first column in fig. 3 is a stylized image, the image in the first row is a content image, and the remaining images in fig. 3 are stylized migrated images. The images in the second row and the second column are style transition images obtained by stylizing the content images in the first row and the second column by using an image conversion network corresponding to the style of the style images in the second row and the first column; the images in the second row and the third column are style transition images obtained by stylizing the content images in the first row and the third column by using an image conversion network corresponding to the style of the style images in the second row and the first column, and so on. As shown in fig. 3, the genre migration image already has the genre of the corresponding genre image.

The advantages of the method provided by the present invention will be illustrated by comparing with two image stylization processing methods in the prior art. Table 1 shows the comparison result between the present method and two image stylization processing methods in the prior art.

TABLE 1

As shown in table 1, the paper "neural algorithm for artistic style" was filed in 2015 by gaits et al, and the method proposed in the paper cannot obtain an image conversion network, but can be applied to any style, and it takes 9.52 seconds to obtain a corresponding style migration image.

Johnson et al published a paper "real-time style conversion and super-resolution perception loss" in the european computer vision conference in 2016, and the method proposed in the paper takes 4 hours to obtain a corresponding image conversion network, and is only applicable to one style, but only takes 0.015s to obtain a corresponding style migration image.

Compared with the two methods, the method provided by the invention not only can be suitable for any style, but also only needs to take 0.022s to obtain the corresponding image conversion network and only needs to take 0.015s to obtain the corresponding style migration image, thereby effectively improving the speed of obtaining the image conversion network and the efficiency of obtaining the style migration image.

Fig. 4 is a flowchart illustrating an image transformation network acquisition method according to another embodiment of the present invention, the method being performed by a terminal, the method being performed based on a trained first network, as shown in fig. 4, and the method including the steps of:

step S400, a first image and a second image are acquired.

The first image and the second image are two stylized images with different styles, and the first image and the second image can be stylized images with any style and are not limited to stylized images with certain specific styles. The first image and the second image may be style images in a website, or style images shared by other users, which is not limited herein.

Step S401, inputting the first image into the first network, performing a forward propagation operation in the first network, and determining weighted operation layer data corresponding to the first image.

Because the first network is trained, the first network can be well suitable for images of any style and images of any content, and after the first image is input into the first network, the first image does not need to be trained, and only one forward propagation operation is needed in the first network, so that the weighted operation layer data in the weighted operation layer corresponding to the first image can be quickly determined. The weighted operation layer data may be vector data.

Specifically, the weighting operation layer may be a bottleneck layer, and the weighting operation layer data is the bottleneck layer data. The bottleneck layer is the layer with the smallest vector dimension among the convolutional layers of the first network. The transfer function of the bottleneck layer may be linear or non-linear. The bottleneck layer plays a crucial role in the first network, and avoids the easily implemented one-to-one equal output and input mapping relationship, which enables the first network to encode and compress the input style image, and to decode and decompress the input style image after the bottleneck layer to generate the estimation value of the style image. Therefore, the bottleneck layer has the function of filtering noise, and the bottleneck layer data in the bottleneck layer contains the basic information of the style image. Then, in step S401, after the first image is input to the first network, the determined weighted arithmetic layer data corresponding to the first image contains the basic information of the first image.

Step S402, inputting the second image into the first network, performing a forward propagation operation in the first network, and determining weighted operation layer data corresponding to the second image.

Because the trained first network can be well suitable for images of any style and any content, after the second image is input into the first network, the second image does not need to be trained, and the weighted operation layer data in the weighted operation layer corresponding to the second image can be quickly determined only by carrying out forward propagation operation once in the first network, wherein the weighted operation layer data comprises the basic information of the second image.

Step S403, according to the preset fusion weight, performing a weighting operation on the weighted operation layer data corresponding to the first image and the weighted operation layer data corresponding to the second image in the weighted operation layer of the first network, so as to obtain a second network corresponding to the fused style of the first image and the second image.

Specifically, since the weighted operation layer data corresponding to the first image includes the basic information of the first image and the weighted operation layer data corresponding to the second image includes the basic information of the second image, in step S403, the weighted operation layer data corresponding to the first image and the weighted operation layer data corresponding to the second image are weighted in the weighted operation layer of the first network according to the preset fusion weight, and then the weighted operation result is substituted into the first network, so that the second network corresponding to the fused style of the first image and the second image can be obtained, thereby proving the spatial continuity of the network manifold.

In an embodiment of the present invention, the preset fusion weight may include a preset fusion weight corresponding to the first image and a preset fusion weight corresponding to the second image. Specifically, in the weighted operation layer of the first network, the weighted operation layer data corresponding to the first image may be multiplied by the preset fusion weight corresponding to the first image, the weighted operation layer data corresponding to the second image may be multiplied by the preset fusion weight corresponding to the second image, and then the multiplied result is subjected to weighted operation, so as to obtain the second network corresponding to the fused style of the first image and the second image. Assuming that the preset fusion weight corresponding to the first image is 0.8 and the preset fusion weight corresponding to the second image is 0.2, the fused style of the first image and the second image is the style fused with 80% of the first image and 20% of the second image.

According to the image conversion network acquisition method provided by the embodiment of the invention, the weighted operation layer data corresponding to the two style images can be quickly determined by utilizing the trained first network, the image conversion network corresponding to the fused style of the two style images can be conveniently obtained according to the weighted operation layer data corresponding to the two style images, and the efficiency of obtaining the image conversion network is further improved.

The invention also provides an image stylized fusion processing method, which comprises the following steps: and performing stylization processing on the third image to be processed by utilizing the second network obtained by the image conversion network obtaining method provided by the invention to obtain a fourth image corresponding to the third image.

Fig. 5a is a flow chart of an image stylized fusion processing method according to an embodiment of the present invention, as shown in fig. 5a, the method includes the following steps:

step S500, a third image to be processed is acquired.

When the user wants to process another image other than the first image and the second image into an image having both the style of the first image and the style of the second image, the image may be acquired in step S500. In order to distinguish from the first image and the second image above, another image other than the first image and the second image which the user wants to process is referred to as a third image to be processed in the present invention.

Step S501, a second network corresponding to the fused style of the first image and the second image is used for stylizing a third image to be processed, and a fourth image corresponding to the third image is obtained.

In the above embodiment of the image conversion network acquiring method, how to obtain the second network corresponding to the style of the fused first image and the second image by the image conversion network acquiring method provided by the present invention has been described in detail, and details are not repeated herein.

After the third image to be processed is obtained, stylizing the third image to be processed by using a second network corresponding to the fused style of the first image and the second image, wherein the fourth image obtained after stylizing is the style transition image corresponding to the third image, and the style transition image has the fused style of the first image and the second image. For example, when the second network is an image conversion network in which the style of the first image is 80% and the style of the second image is 20%, the fourth image obtained by performing the stylization processing using the image conversion network has the style of the first image of 80% and the style of the second image of 20%.

Fig. 5b shows an exemplary group diagram of the result of stylizing the third image using the second network corresponding to the merged style of the first image and the second image obtained by the present invention, wherein, in the group diagram shown in fig. 5b, the image shown in the upper right corner of the first image is the first image, the image shown in the upper right corner of the last image is the second image, and the other images in the group diagram are a plurality of fourth images obtained by stylizing the second image using the second network obtained based on different preset merging weights, and the plurality of fourth images have different proportions of the style of the first image and the style of the second image. As can be seen from fig. 5b, the fourth image shown from left to right in the set of drawings has a gradually decreasing aspect ratio of the style of the first image and a gradually increasing aspect ratio of the style of the second image.

According to the image stylized fusion processing method provided by the embodiment of the invention, the fusion of the image styles is realized, the stylized processing can be conveniently and rapidly carried out on the images by utilizing the image conversion network corresponding to the fused styles of the two images, the style migration images with the fused styles are obtained, the image stylized processing efficiency is improved, and the image stylized processing mode is optimized.

The invention also provides an image stylized intensity adjusting method, which comprises the following steps: and performing stylization processing on any one of the first image and the second image by using the second network obtained by the image conversion network acquisition method provided by the invention to obtain a corresponding fifth image.

Fig. 6a is a schematic flow chart of an image stylization intensity adjustment method according to an embodiment of the present invention, and as shown in fig. 6a, the method includes the following steps:

step S600, a second image is acquired.

Assuming that the user wants to process the second image into an image having the style of the first image and conforming to a certain stylized intensity, the second image may be acquired in step S600.

Step S601, performing stylization processing on the second image by using a second network corresponding to the fused style of the first image and the second image, to obtain a corresponding fifth image.

After the second image is obtained, stylizing the second image by using a second network corresponding to the style after the first image and the second image are fused, wherein a fifth image obtained after stylizing is a style transition image corresponding to the second image, and the style transition image has the style of the first image and accords with preset stylized intensity.

The specific size of the stylized intensity is related to a preset fusion weight value based on the image conversion network acquisition method in the process of generating the second network. Assuming that the user wants to process the second image into an image having the style of the first image and having a stylized intensity of 60%, in step S601, the second network obtained by the image transformation network obtaining method under the condition that the preset fusion weight corresponding to the first image is 0.6 and the preset fusion weight corresponding to the second image is 0.4 is used to stylize the second image, the second network is an image conversion network fused with 60% of the style of the first image and 40% of the style of the second image, so that the fifth image corresponding to the obtained second image has 60% of the style of the first image and 40% of the style of the second image, namely the fifth image retains 40% of the original style of the second image, that is, the fifth image has the style of the first image and the stylized intensity is 60%, thereby realizing the adjustment of the stylized intensity of the image.

Fig. 6b shows an exemplary group diagram of the result of stylizing the second image using the second network corresponding to the merged style of the first image and the second image obtained by the present invention, where in the group diagram shown in fig. 6b, the image shown in the upper right corner of the first image is the second image, the image shown in the upper right corner of the last image is the first image, and the other images in the group diagram are fifth images obtained by stylizing the second image using the second network obtained based on different preset merging weights, and the fifth images have different stylizing strengths. As can be seen from fig. 6b, the fifth image shown from left to right in the set of drawings has a gradually increasing aspect ratio of the style of the first image, i.e. its stylized intensity gradually increases.

According to the image stylization intensity adjusting method provided by the embodiment of the invention, the adjustment of the stylization intensity of the image is realized, and the stylization processing of any one of the two images can be conveniently and rapidly carried out by utilizing the image conversion network corresponding to the fused style of the two images, so that the style transition image with the corresponding stylization intensity is obtained, the image stylization processing efficiency is improved, and the image stylization processing mode is optimized.

Fig. 7 is a block diagram showing a configuration of an image conversion network acquisition apparatus according to an embodiment of the present invention, which operates based on a trained first network, as shown in fig. 7, and includes: an acquisition module 711 and a mapping module 712.

The acquisition module 711 is adapted to: a first image and a second image are acquired.

The first image and the second image are two stylized images with different styles, and the first image and the second image can be stylized images with any style and are not limited to stylized images with certain specific styles.

The mapping module 712 is adapted to: and respectively inputting the first image and the second image into a first network, and performing weighted operation on a weighted operation layer of the first network according to a preset fusion weight value to obtain a second network corresponding to the fused style of the first image and the second image.

Specifically, the sample image used for the first network training includes: a plurality of first sample images stored by the genre image library and a plurality of second sample images stored by the content image library. The mapping module 712 inputs the first image and the second image acquired by the acquiring module 711 into the first network, performs a weighting operation on a weighting operation layer of the first network according to a preset fusion weight, and substitutes the weighting operation result into the first network, so as to quickly obtain the second network corresponding to the fused style of the first image and the second image.

According to the image conversion network acquisition device provided by the embodiment of the invention, the acquisition module acquires the first image and the second image, the mapping module respectively inputs the first image and the second image into the first network, and the weighting operation is carried out on the weighting operation layer of the first network according to the preset fusion weight value, so that the second network corresponding to the fused style of the first image and the second image is obtained. According to the technical scheme provided by the invention, the trained first network can be used for quickly obtaining the image conversion network corresponding to the fused style of the two style images, so that the efficiency of obtaining the image conversion network is effectively improved, and the processing mode of the image conversion network is optimized.

Fig. 8 is a block diagram showing a connection configuration of an image conversion network acquisition apparatus and an image stylized fusion processing apparatus according to another embodiment of the present invention, in which the image conversion network acquisition apparatus operates based on a trained first network, and as shown in fig. 8, the image conversion network acquisition apparatus 810 includes: the acquiring module 811, the first network training module 812 and the mapping module 813, the image stylized fusion processing apparatus 820 includes: a first processing module 821.

The acquisition module 811 in the image conversion network acquisition device 810 is adapted to: a first image and a second image are acquired.

The first network training module 812 is adapted to: the first network is trained.

Wherein the training process of the first network is completed through a plurality of iterations. The first network training module 812 is adapted to: in an iteration process, a first sample image is extracted from the style image library, at least one second sample image is extracted from the content image library, and the first network training is realized by utilizing the first sample image and the at least one second sample image.

Optionally, the first network training module 812 is adapted to: generating a third sample image corresponding to the second sample image by using a third network corresponding to the style of the first sample image in an iteration process; and obtaining a first network loss function according to the style loss between the third sample image and the first sample image and the content loss between the third sample image and the second sample image, and realizing the training of the first network by using the first network loss function.

In a particular embodiment, the first network training module 812 may include: extraction unit 8121, generation unit 8122, processing unit 8123, and update unit 8124.

In particular, the extraction unit 8121 is adapted to: a first sample image is extracted from the genre image library, and at least a second sample image is extracted from the content image library.

The generating unit 8122 is adapted to: and inputting the first sample image into the first network to obtain a third network corresponding to the style of the first sample image.

Wherein the third network is a switching network obtained in the training process of the first network. The generating unit 8122 is further adapted to: extracting style texture features from the first sample image; and inputting the style texture features into the first network to obtain a third network corresponding to the style texture features.

The processing unit 8123 is adapted to: and generating corresponding third sample images respectively aiming at the at least one second sample image by utilizing a third network corresponding to the style of the first sample image.

The updating unit 8124 is adapted to: and obtaining a first network loss function according to the style loss between the at least one third sample image and the first sample image and the content loss between the at least one third sample image and the corresponding second sample image, and updating the weight parameter of the first network according to the first network loss function. Wherein, those skilled in the art can set the specific content of the first network loss function according to actual needs, and the content is not limited herein. In one embodiment, the first network loss function may be:

is the loss of style between the third sample image and the first sample image, theta is the weight parameter of the neural network, and lambda_cFor presetting content loss weight, λ_sWeight is lost for the default style.

The first network training module 812 iteratively runs until a predetermined convergence condition is met. Specifically, the predetermined convergence condition includes: the iteration times reach the preset iteration times; and/or the output value of the first network loss function is smaller than a preset threshold value; and/or the visual effect parameter of the third sample image corresponding to the second sample image reaches the preset visual effect parameter.

The first network training module 812 is further adapted to: fixedly extracting a first sample image, and alternatively extracting at least one second sample image; and after the second sample image in the content image library is extracted, replacing the next first sample image and then extracting at least one second sample image. By the method, the first network suitable for the images of any style and any content can be trained efficiently, so that the time required for training the first network is effectively shortened, and the training efficiency of the first network is improved.

The mapping module 813 is adapted to: inputting a first image into a first network, performing forward propagation operation once in the first network, and determining weighted operation layer data corresponding to the first image; inputting the second image into the first network, and performing forward propagation operation once in the first network to determine weighted operation layer data corresponding to the second image; and according to the preset fusion weight, carrying out weighted operation on the weighted operation layer data corresponding to the first image and the weighted operation layer data corresponding to the second image in the weighted operation layer of the first network to obtain a second network corresponding to the fused style of the first image and the second image.

In an embodiment of the present invention, the first network is a meta-network obtained by training a neural network, and the second network is an image transformation network. Specifically, the weighting operation layer is a bottleneck layer, which is a layer having the smallest vector dimension among convolution layers of the first network. The weighted operation layer data may be vector data.

The first processing module 821 in the image stylized fusion processing apparatus 820 is adapted to: and performing stylization processing on the third image to be processed by using the second network obtained by the image conversion network obtaining device 810 to obtain a fourth image corresponding to the third image.

According to the technical scheme provided by the invention, the weighted operation layer data corresponding to the two style images can be quickly determined by utilizing the trained first network, the image conversion network corresponding to the fused style of the two style images can be conveniently obtained according to the weighted operation layer data corresponding to the two style images, and the efficiency of obtaining the image conversion network is further improved; in addition, the image can be conveniently and quickly stylized by using the image conversion network corresponding to the fused style of the two images, so that the style migration image with the fused style is obtained, the fusion of the image styles is realized, the image stylization processing efficiency is improved, and the image stylization processing mode is optimized.

Fig. 9 is a block diagram illustrating a connection structure between an image transformation network acquiring device and an image stylization intensity adjusting device according to another embodiment of the present invention, and as shown in fig. 9, the image transformation network acquiring device in this embodiment is the image transformation network acquiring device 810 shown in fig. 8, which is not described herein again. The image stylization intensity adjustment apparatus 920 includes: and a second processing module 921.

The second processing module 921 is adapted to: using the second network obtained by the image conversion network obtaining device 810, a stylization process is performed on any one of the first image and the second image, and a corresponding fifth image is obtained.

According to the technical scheme provided by the invention, the stylized intensity of the image is adjusted, any one of the two images can be conveniently and quickly stylized by utilizing the image conversion network corresponding to the fused style of the two images, so that a style migration image with corresponding stylized intensity is obtained, the stylized processing efficiency of the image is improved, and the stylized processing mode of the image is optimized.

The invention also provides a terminal which comprises the image conversion network acquisition device. The terminal can be a mobile phone, a PAD, a computer, a camera device and the like.

The invention also provides a nonvolatile computer storage medium, and the computer storage medium stores at least one executable instruction which can execute the image conversion network acquisition method in any method embodiment. The computer storage medium can be a memory card of a mobile phone, a memory card of a PAD, a magnetic disk of a computer, a memory card of a camera device, and the like.

Fig. 10 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device. The computing device can be a mobile phone, a PAD, a computer, a camera device, a server, and the like.

As shown in fig. 10, the computing device may include: a processor (processor)1002, a Communications Interface 1004, a memory 1006, and a Communications bus 1008.

Wherein:

the processor 1002, communication interface 1004, and memory 1006 communicate with each other via a communication bus 1008.

A communication interface 1004 for communicating with network elements of other devices, such as clients or other servers.

The processor 1002 is configured to execute the program 1010, and may specifically execute the relevant steps in the above embodiment of the image transformation network acquisition method.

In particular, the program 1010 may include program code that includes computer operating instructions.

The processor 1002 may be a central processing unit CPU, or an application Specific Integrated circuit asic, or one or more Integrated circuits configured to implement an embodiment of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

The memory 1006 is used for storing the program 1010. The memory 1006 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 1010 may be specifically configured to cause the processor 1002 to execute the image conversion network acquisition method in any of the method embodiments described above. For specific implementation of each step in the program 1010, reference may be made to corresponding steps and corresponding descriptions in units in the above-mentioned image transformation network acquisition embodiment, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

The invention also provides a nonvolatile computer storage medium, and the computer storage medium stores at least one executable instruction which can execute the image stylized fusion processing method in any method embodiment. The computer storage medium can be a memory card of a mobile phone, a memory card of a PAD, a magnetic disk of a computer, a memory card of a camera device, and the like.

The present invention also provides a computing device comprising: the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the image stylized fusion processing method. The computing device may be a mobile phone, a PAD, a computer, a camera device, etc. The schematic structure of the computing device is the same as the schematic structure of the computing device shown in fig. 10, and is not described here again.

The invention also provides a nonvolatile computer storage medium, and the computer storage medium stores at least one executable instruction which can execute the image stylization intensity adjusting method in any method embodiment. The computer storage medium can be a memory card of a mobile phone, a memory card of a PAD, a magnetic disk of a computer, a memory card of a camera device, and the like.

The present invention also provides a computing device comprising: the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the image stylization intensity adjusting method. The computing device may be a mobile phone, a PAD, a computer, a camera device, etc. The schematic structure of the computing device is the same as the schematic structure of the computing device shown in fig. 10, and is not described here again.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in accordance with embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. An image translation network acquisition method, the method performed based on a trained first network, the method comprising:

acquiring a first image and a second image, wherein the first image and the second image are two images with different styles;

inputting the first image and the second image into the first network respectively, and performing weighted operation on a weighted operation layer of the first network according to a preset fusion weight to obtain a second network corresponding to the fused style of the first image and the second image;

wherein the training of the first network comprises: extracting a first sample image from the style image library, and extracting at least one second sample image from the content image library; inputting the first sample image into a first network to obtain a third network corresponding to the style of the first sample image; generating corresponding third sample images respectively aiming at least one second sample image by utilizing a third network corresponding to the style of the first sample image; obtaining a first network loss function according to the style loss between at least one third sample image and the first sample image and the content loss between at least one third sample image and the corresponding second sample image, and updating the weight parameter of the first network according to the first network loss function;

iteratively performing the training step of the first network until a predetermined convergence condition is satisfied, wherein iteratively performing the training step of the first network comprises: in the process of multiple iterations, a first sample image is fixedly extracted, and at least one second sample image is alternatively extracted; and after the second sample image in the content image library is extracted, replacing the next first sample image and then extracting at least one second sample image.

2. The method according to claim 1, wherein the inputting the first image and the second image into the first network, and performing a weighting operation on a weighting operation layer of the first network according to a preset fusion weight to obtain a second network corresponding to a fused style of the first image and the second image further comprises:

inputting the first image into the first network, performing forward propagation operation once in the first network, and determining weighted operation layer data corresponding to the first image;

inputting the second image into the first network, performing forward propagation operation in the first network once, and determining weighted operation layer data corresponding to the second image;

and according to a preset fusion weight, carrying out weighting operation on the weighted operation layer data corresponding to the first image and the weighted operation layer data corresponding to the second image in the weighted operation layer of the first network to obtain a second network corresponding to the fused style of the first image and the second image.

3. The method of claim 1, wherein the weighted operation layer is a bottleneck layer; the bottleneck layer is the layer with the smallest vector dimension among the convolutional layers of the first network.

4. The method of claim 2, wherein the weighted operation layer data is vector data.

5. The method of claim 1, wherein the sample image used for first network training comprises: a plurality of first sample images stored by the genre image library and a plurality of second sample images stored by the content image library.

6. The method of claim 1, wherein the training process of the first network is completed through a plurality of iterations; wherein, the one-time iteration process comprises the following steps:

obtaining a first network loss function according to the style loss between the third sample image and the first sample image and the content loss between the third sample image and the second sample image, and implementing the training of a first network by using the first network loss function.

7. The method of claim 1, wherein the predetermined convergence condition comprises: the iteration times reach the preset iteration times; and/or the output value of the first network loss function is smaller than a preset threshold value; and/or the visual effect parameter of the third sample image corresponding to the second sample image reaches a preset visual effect parameter.

8. The method of claim 1, the inputting the first sample image into a first network, deriving a third network corresponding to a style of the first sample image further comprising:

extracting style texture features from the first sample image;

and inputting the style texture features into a first network to obtain a third network corresponding to the style texture features.

9. The method according to any one of claims 1-8, wherein the first network is a meta-network trained on a neural network, and the second network is an image-transformation network.

10. The method according to any of claims 1-8, the third network being a switching network obtained during training of the first network.

11. The method according to any of claims 1-8, the method being performed by a terminal.

12. An image stylized fusion processing method, the method comprising:

stylizing a third image to be processed using a second network obtained by the method of any of claims 1-11, resulting in a fourth image corresponding to the third image.

13. An image stylization intensity adjustment method, the method comprising:

stylizing either one of the first image and the second image using the second network obtained by the method of any one of claims 1-11 to obtain a corresponding fifth image.

14. An image translation network acquisition device, the device operating based on a trained first network, the device comprising:

the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is suitable for acquiring a first image and a second image, and the first image and the second image are two images with different styles;

the mapping module is suitable for respectively inputting the first image and the second image into the first network, and performing weighted operation on a weighted operation layer of the first network according to a preset fusion weight value to obtain a second network corresponding to the fused style of the first image and the second image;

wherein the apparatus further comprises: a first network training module;

the first network training module is adapted to: extracting a first sample image from the style image library, and extracting at least one second sample image from the content image library; inputting the first sample image into a first network to obtain a third network corresponding to the style of the first sample image; generating corresponding third sample images respectively aiming at least one second sample image by utilizing a third network corresponding to the style of the first sample image; obtaining a first network loss function according to the style loss between at least one third sample image and the first sample image and the content loss between at least one third sample image and the corresponding second sample image, and updating the weight parameter of the first network according to the first network loss function;

the first network training module is operated in an iterative mode until a preset convergence condition is met, wherein the first network training module comprises: in the process of multiple iterations, a first sample image is fixedly extracted, and at least one second sample image is alternatively extracted; and after the second sample image in the content image library is extracted, replacing the next first sample image and then extracting at least one second sample image.

15. The apparatus of claim 14, wherein the mapping module is further adapted to:

16. The apparatus of claim 14, wherein the weighted operation layer is a bottleneck layer; the bottleneck layer is the layer with the smallest vector dimension among the convolutional layers of the first network.

17. The apparatus of claim 15, wherein the weighted operation layer data is vector data.

18. The apparatus of claim 14, wherein the sample image used for first network training comprises: a plurality of first sample images stored by the genre image library and a plurality of second sample images stored by the content image library.

19. The apparatus of claim 14, wherein the apparatus further comprises: a first network training module; the training process of the first network is completed through multiple iterations;

20. The apparatus of claim 14, wherein the predetermined convergence condition comprises: the iteration times reach the preset iteration times; and/or the output value of the first network loss function is smaller than a preset threshold value; and/or the visual effect parameter of the third sample image corresponding to the second sample image reaches a preset visual effect parameter.

21. The apparatus of claim 14, the first network training module further adapted to:

extracting style texture features from the first sample image;

22. The apparatus of any of claims 14-21, the first network being a meta-network trained on a neural network, the second network being an image-transformation network.

23. The apparatus of any of claims 14-21, the third network being a switching network obtained during training of the first network.

24. A terminal comprising the image conversion network acquisition apparatus according to any one of claims 14 to 23.

25. An image stylized fusion processing apparatus, the apparatus comprising:

a first processing module adapted to perform a stylization process on a third image to be processed by using the second network obtained by the apparatus according to any one of claims 14 to 23, so as to obtain a fourth image corresponding to the third image.

26. An image stylization intensity adjustment apparatus, the apparatus comprising:

a second processing module adapted to perform a stylization process on any one of the first image and the second image using the second network obtained by the apparatus of any one of claims 14-23 to obtain a corresponding fifth image.

27. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the image conversion network acquisition method according to any one of claims 1-11.

28. A computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the image conversion network acquisition method according to any one of claims 1 to 11.

29. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the image stylized fusion processing method according to claim 12.

30. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the image stylized fusion processing method of claim 12.

31. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the image stylization strength adjustment method according to claim 13.

32. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the image stylization intensity adjustment method of claim 13.