WO2023116744A1

WO2023116744A1 - Image processing method and apparatus, device, and medium

Info

Publication number: WO2023116744A1
Application number: PCT/CN2022/140574
Authority: WO
Inventors: 黄奇伟
Original assignee: 北京字跳网络技术有限公司
Priority date: 2021-12-21
Filing date: 2022-12-21
Publication date: 2023-06-29
Also published as: CN116310615A

Abstract

Embodiments of the present invention relate to an image processing method and apparatus, a device, and a medium. The method comprises: obtaining a first object feature of a first style sample image, and training a first generative adversarial network according to the first object feature and the first style sample image; obtaining a second object feature of the second style sample image, and training a second generative adversarial network according to the second object feature and the second style sample image; and fusing the first generative adversarial network and the second generative adversarial network to generate a style conversion network, so as to perform image style conversion on images of the first style and the second style according to the style conversion network.

Description

Image processing method, device, equipment and medium

Cross References to Related Applications

This application is based on the Chinese application with the application number 202111574622.1 and the filing date is December 21, 2021, and claims its priority. The disclosure content of the Chinese application is hereby incorporated into this application as a whole.

technical field

The present disclosure relates to the technical field of computer vision, and in particular to an image processing method, device, equipment and medium.

Background technique

With the advancement of computer vision technology, technologies such as converting images between styles can be converted into different styles, and thus have been widely used in applications such as photo processing.

In related technologies, in order to realize image style conversion, it is necessary to obtain sample images of different styles of each original image in advance, and perform network training based on the sample images of different styles, thereby performing style conversion on the input image based on the trained network deal with.

Contents of the invention

The disclosure provides an image processing method, device, equipment and medium.

An embodiment of the present disclosure provides an image processing method, the method comprising: acquiring a first object feature of a first style sample image, and training a first adversarial generation network according to the first object feature and the first style sample image ; Obtain the second object feature of the second style sample image, train the second confrontation generation network according to the second object feature and the second style sample image; generate the first confrontation network and the second confrontation generation The network performs fusion processing to generate a style conversion network, so as to perform image style conversion processing on images of the first style and the second style according to the style conversion network.

An embodiment of the present disclosure also provides an image processor, the device includes: a first training module, configured to obtain a first object feature of a first style sample image, and according to the first object feature and the first style The sample image trains the first confrontation generation network; the second training module is used to obtain the second object feature of the second style sample image, and trains the second confrontation generation network according to the second object feature and the second style sample image; A fusion module, configured to perform fusion processing on the first confrontational generation network and the second confrontational generation network, and generate a style conversion network, so as to perform a fusion process on the first style and the second style according to the style conversion network. The image undergoes image style conversion processing.

An embodiment of the present disclosure also provides an electronic device, which includes: a processor; a memory for storing instructions executable by the processor; and the processor, for reading the instruction from the memory. The instructions can be executed, and the instructions are executed to implement the image processing method provided by the embodiment of the present disclosure.

The embodiment of the present disclosure also provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to execute the image processing method provided by the embodiment of the present disclosure.

Description of drawings

The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

FIG. 1 is a schematic diagram of an image processing scene provided by an embodiment of the present disclosure;

FIG. 2 is a schematic flowchart of an image processing method provided by an embodiment of the present disclosure;

FIG. 3 is a schematic flowchart of another image processing method provided by an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of another image processing scenario provided by an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of another image processing scenario provided by an embodiment of the present disclosure;

FIG. 6 is a schematic flowchart of another image processing method provided by an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of another image processing scenario provided by an embodiment of the present disclosure;

FIG. 8 is a schematic flowchart of another image processing method provided by an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of another image processing scenario provided by an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of another image processing scenario provided by an embodiment of the present disclosure;

FIG. 11 is a schematic flowchart of another image processing method provided by an embodiment of the present disclosure;

FIG. 12 is a schematic flowchart of another image processing method provided by an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of another image processing scenario provided by an embodiment of the present disclosure;

FIG. 14 is a schematic structural diagram of an image processor provided by an embodiment of the present disclosure;

Fig. 15 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.

It should be understood that the various steps described in the method implementations of the present disclosure may be executed in different orders, and/or executed in parallel. Additionally, method embodiments may include additional steps and/or omit performing illustrated steps. The scope of the present disclosure is not limited in this regard.

As used herein, the term "comprise" and its variations are open-ended, ie "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one further embodiment"; the term "some embodiments" means "at least some embodiments." Relevant definitions of other terms will be given in the description below.

It should be noted that concepts such as "first" and "second" mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the sequence of functions performed by these devices, modules or units or interdependence.

It should be noted that the modifications of "one" and "multiple" mentioned in the present disclosure are illustrative and not restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, it should be understood as "one or more" multiple".

The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are used for illustrative purposes only, and are not used to limit the scope of these messages or information.

When performing network training based on sample images of different styles, it is obviously necessary to process a large number of original images in advance to obtain sample images of different styles. Acquiring sample images in advance consumes a lot of computing power and the efficiency of network training is low.

When training the style conversion network above, it is necessary to pre-process the original image to obtain sample images of different styles. For example, when training the style conversion network from plain face to oil painting style, it is necessary to obtain the face image of the plain face, and then process the face image of the plain face Obtaining oil painting-style face images is not only difficult, but also consumes a lot of computing power, resulting in low network training efficiency.

In order to solve the above technical problems, this disclosure proposes a network training method that does not need to perform style conversion processing on the original image in advance to obtain training sample images. In this method, as shown in Figure 1, two adversarial generation networks A and B, where the confrontation generation network A only processes the sample image of the first style so that A can obtain the image of the first style for the input image, and B only processes the sample image of the second style so that B can output the sample image of the second style , and further, the style transfer network can be obtained based on the fusion of A and B. Thus, the training of the style conversion network can be realized based on the original sample images of the first style and the sample images of the second style without pre-transforming the sample images from the first style to the second style.

The image processing method will be introduced below in combination with specific embodiments.

FIG. 2 is a schematic flowchart of an image processing method provided by an embodiment of the present disclosure. The method can be executed by an image processor, wherein the apparatus can be implemented by software and/or hardware, and generally can be integrated in an electronic device. As shown in Figure 2, the method includes:

Step 201, acquire first object features of a first style sample image, and train a first adversarial generation network according to the first object features and the first style sample image.

Wherein, the first style sample image may correspond to the second style sample mentioned in subsequent embodiments, and the first style sample image and the second style sample image may be sample images of any different styles, for example, the first style sample image is a human face image, the second style sample image may be an animal face image, for example, the first style sample image is a human face image without makeup, and the second style sample image is an oil painting style sample image.

In some possible embodiments, the first style sample image may be the original image with the first style acquired in the database, or the first style obtained after further strengthening the first sub-style of the original image. Sample image, for example, when the first style sample image is a plain face image, you can directly obtain the plain face image in the relevant database as the first style sample image, or remove makeup from the acquired face image to obtain First style sample image.

In this embodiment, after the first style sample image is acquired, the first object feature of the first style sample image is extracted, and the first object feature is any feature reflecting the style feature of the first style sample image, including but not Limited to pixel color features, key pixel position features, etc., key point pixel semantic features, area contour features, etc., and then, according to the first object features and the first style sample image to train the first object generation network, so that the training obtained the first A pair of adversarial generative networks can extract the first object features of the input image to obtain style sample images with the first style.

Step 202, acquire the second object features of the second style sample image, and train the second adversarial generation network according to the second object feature and the second style sample image.

In this embodiment, as described above, the second style sample image corresponds to the first sub-style sample image, and the second style sample image may be the original image with the second style acquired in the database, or is the second style sample image obtained after further strengthening the second style of the original image. For example, when the second style sample image is an oil painting style face image, the oil painting face image can be directly obtained in the relevant database as the second style. The style sample image may also be enhanced by oil painting features on the acquired face image of famous paintings to obtain a second style sample image.

It should be noted that the second style sample image and the first style sample image in the training stage are obtained separately, and the second style sample image is not obtained by processing the first style sample image, so the consumption of computing power is relatively low, further It helps to improve the training efficiency of the style transfer network.

In this embodiment, after the second style sample image is acquired, the second object feature of the second style sample image is extracted, and the second object feature is any feature reflecting the style feature of the second style sample image, including but not Limited to pixel color features, key pixel position features, etc., key point pixel semantic features, area contour features, etc., and then train the second object generation network according to the second object features and the second style sample image, so that the trained first The two-way adversarial generation network can extract the second object feature of the input image to obtain a style sample image with the second style.

Step 203, performing fusion processing on the first adversarial generation network and the second adversarial generation network to generate a style conversion network, so as to perform image style conversion processing on images of the first style and the second style according to the style conversion network.

In this embodiment, the first adversarial generation network and the second adversarial generation network are fused to generate a style conversion network, so that the style conversion network can not only convert the input image to the first style, but also realize the conversion of the input image to the first style. The conversion of the second style realizes the image style conversion processing of the images of the first style and the second style based on the style conversion network.

It should be noted that in different application scenarios, the fusion of the first adversarial generation network and the second adversarial generation network to generate a style transfer network is different. The examples are as follows:

In one embodiment of the present disclosure, as shown in FIG. 3 , the first confrontation generation network and the second confrontation generation network are fused to generate a style transfer network, including:

Step 301: Determine a first weight corresponding to the first adversarial generation network and a second weight corresponding to the second adversarial generation network according to the similarity between the first object feature and the second object feature.

In this embodiment, the similarity between the first object feature and the second object feature is determined, the similarity reflects the similarity in the feature dimension between the image output by the first confrontation generation network and the image generated by the second confrontation network, if If the similarity is low, the weights of the two generative adversarial networks will affect the effect of the final generated style conversion image. If the second weight corresponding to the adversarial generation network is larger, the output style-converted image is more inclined to the first style. On the contrary, if the similarity is low, the first weight corresponding to the first adversarial generation network, compared to the second adversarial generation network. The second weight corresponding to the network is smaller, and the output style-converted image is more inclined to the second style.

In this embodiment, the first object feature and the second object feature can be input into a pre-trained deep learning model to obtain the similarity between the first object feature and the second object feature.

In another embodiment of the present disclosure, multiple first key points of the input first style image may be extracted, the first object feature of each first key point may be obtained, and multiple first key points of the input second style image may be extracted. Two key points, obtain the second object feature of each second key point, wherein, the first key point and the second key point can include the nose of the human face, the corners of the eyes, the points where the lips are equal to the contours of the five sense organs of the human face, etc., and then, Calculate the key point similarity of the first object feature and the second object feature of the same key point in the first key point and the second key point, and use the mean value of the key point similarity of all key points as the first object feature and the second object feature Similarity of features.

Further, in some possible embodiments, the corresponding relationship between the first weight and the similarity may be pre-built according to the needs of the scene, and after the first weight is obtained based on the object relationship query, the second weight is obtained based on the first weight.

In other possible real-time, the difference between the similarity and the preset standard similarity can be calculated, and based on the difference, the preset object relationship can be queried to obtain the weight correction value, and based on the sum of the standard first weight value and the weight correction value, it can be obtained The first weight, and further, the second weight is obtained based on the first weight.

Step 302, obtain the first product result of the output image of the first confrontation generating network and the first weight, and obtain the second product result of the output image of the second confrontation generating network and the second weight, and combine the first product result and the second The product results are fused to generate a style transfer network.

In this embodiment, as shown in FIG. 4, the first product result of the output image of the first confrontation generation network and the first weight is obtained, the first product result corresponds to the first style, and the output of the second confrontation generation network is obtained The second product result of the image and the second weight, the second product result corresponds to the second style, and the first product result and the second product result are fused to obtain the fusion processing result of the first style and the second style, thus , to generate a style conversion network, the output image in this embodiment can be regarded as a variable, and the style conversion network is a combination of processing network parameters for the corresponding variable.

In another embodiment of the present disclosure, considering that the first adversarial generation network can convert the input image into the first-style output image, and the second adversarial generation network can convert the input image into the second-style output image, therefore, As shown in Figure 5, the output of the first confrontation network and the second confrontation network can be connected to the alignment network, and the alignment network is used to align the output objects of the first confrontation network and the second confrontation network, and the alignment process includes One or more of attitude alignment, pixel color alignment, etc. (the alignment algorithm can refer to matrix alignment, feature point alignment, etc., which will not be described in detail here), thus, in this embodiment, the style transfer network includes the first confrontation The generative network, the second adversarial generative network, and the corresponding alignment network.

Therefore, the image processing method of the embodiment of the present disclosure, when training the style conversion network, does not need to pre-process the acquisition of the first style sample image and the second style sample image corresponding to the first style sample image, that is, it does not need to consume computing power to image For the conversion calculation from the first style to the second style, after processing the input image based on the first adversarial generation network and the second adversarial generation network, the sample image of the first style and the sample image of the second style can be fused to obtain The style conversion network that performs style conversion processing reduces the training computing power consumption of the style network.

To sum up, the image processing method of the embodiment of the present disclosure obtains the first object feature of the first style sample image, trains the first adversarial generation network according to the first object feature and the first style sample image, and then obtains the second style sample image The second object feature of , train the second adversarial generation network according to the second object feature and the second style sample image, perform fusion processing on the first adversarial generation network and the second adversarial generation network, generate a style conversion network, and transform the network according to the style An image style conversion process is performed on the images of the first style and the second style. As a result, the processing power requirements for sample images during image style conversion are reduced, and the training efficiency of the style conversion network is improved on the premise of ensuring the effect of style conversion.

It should be noted that in different application scenarios, the training methods for the first adversarial generation network and the second object generation network are different. The following examples illustrate:

In one embodiment of the present disclosure, as shown in FIG. 6 , the first object feature of the first style sample image is taken, and the first adversarial generation network is trained according to the first object feature and the first style sample image, including:

Step 601: Carry out key point segmentation and detection on the first object in the first style sample image, and extract key area contour features of the first object.

Wherein, the first object is an entity object to be style converted, including but not limited to human face, clothing, and the like.

In this embodiment, in order to improve the processing efficiency, the key point segmentation detection is performed on the first object in the first style sample image, and the key area contour features of the first object are extracted, that is to say, based on the key point detection technology, the recognition Different regions of the first object, and furthermore, the first object is divided into a plurality of key regions, so as to perform subsequent image processing based on the granularity of the key regions.

Among them, the key points during the key point segmentation detection can be pre-defined, or can be learned from experimental data. Taking the first object as a face as an example, as shown in Figure 7, the corresponding key points can be the nose area Key points, key points of the left eye area, key points of the right eye area, key points of the mouth area, key points of other face areas, etc., and then extract key area contour features based on these key points, the contour features include but are not limited to The pixel position corresponding to the outline of the key area and the positional relationship between the pixels, etc.

In step 602 , the key area contour features of the first object are processed by the generation network in the first adversarial generation network to be trained to generate a first reference sample image.

In this embodiment, the key region contour features of the first object are processed by the generation network in the first adversarial generation network to be trained to generate the first reference sample image, wherein the first reference sample image is based on the key region contour First-style images in the feature extraction dimension.

Step 603: Determine a first loss function according to the first style sample image and the first reference sample image.

It is easy to understand that since the first adversarial generation network should output images of the first style, the corresponding first adversarial generation network can be trained through the first loss function between the first reference sample image and the first style sample image.

It should be noted that in different application scenarios, the first loss function is calculated in different ways, examples are as follows:

In some possible embodiments, the optical flow field from the first reference sample image to the first style sample image can be calculated, that is, the motion optical flow field of the same key point can be calculated from the first reference sample image to the first style sample image , the determination of the first loss function based on the motion optical flow field, wherein the motion optical flow field identifies the alignment error between the first reference sample image and the first style sample image, and the larger the optical flow field is, it indicates that the first reference sample image The larger the error between and the first style sample image.

In some other possible embodiments, in order to improve the calculation efficiency of the first loss function, the first reference sample image is divided into multiple grid blocks, and the first style sample image is also divided into grid blocks according to the same grid division strategy A plurality of grid blocks, calculating the pixel mean value of all pixels contained in each grid block, based on the difference between the pixel mean values of corresponding position grid blocks between the first reference sample image and the first style sample image Determining the first loss function. For example, the mean of the differences of pixel mean values between all grids is used as the first loss function.

Step 604: Perform back propagation according to the first loss function to train the first adversarial generative network.

In this embodiment, the direction propagation is performed according to the first loss function to train the first adversarial generation network, that is, the network parameters of the first adversarial generation network to be trained are adjusted, so that the first adversarial generation network after adjusting the network parameters can output the same Relevant images with a consistent first style.

In order to achieve smooth style transfer, the training method of the second adversarial generation network can be consistent with that of the first adversarial generation network.

In this embodiment, as shown in FIG. 8, the second object feature of the second style sample image is obtained, and the second adversarial generation network is trained according to the second object feature and the second style sample image, including:

Step 801, perform key point segmentation and detection on the second object in the second style sample image, and extract key area contour features of the second object.

Wherein, the second object is an entity object to be style-transformed, including but not limited to human face, clothing, and the like. The first object can be consistent with the second object. For example, if the first object is a human face, the second object is also a human face. Of course, the first object and the second object can also be inconsistent. For example, if the first object is a human face, the second object can also be a human face. The objects are cat faces and the like.

In this embodiment, in order to improve the processing efficiency, the key point segmentation detection is performed on the second object in the second style sample image, and the key area contour features of the second object are extracted, that is to say, based on the key point detection technology, the recognition Different regions of the second object, and furthermore, the second object is divided into a plurality of key regions, so as to perform subsequent image processing based on the granularity of the key regions.

Among them, the key points during key point segmentation detection can be pre-defined, or can be learned from experimental data. Taking the second object as a face as an example, the corresponding key points can be the key points of the nose area, the left eye area The key points of the key points, the key points of the right eye area, the key points of the mouth area, the key points of other face areas, etc., and then extract the key area contour features based on these key points, the contour features include but not limited to the pixels corresponding to the key area contour The position of the point and the positional relationship between the pixels, etc.

Step 802 , using the generation network in the second adversarial generation network to be trained to process the contour features of key regions of the second object to generate a second reference sample image.

In this embodiment, the key region contour features of the second object are processed by the generation network in the second adversarial generation network to be trained to generate a second reference sample image, wherein the second reference sample image is based on the key region contour Image of the second style in feature extraction dimension.

Step 803: Determine a second loss function according to the second style sample image and the second reference sample image.

It is easy to understand that since the second adversarial generation network should output images of the second style, the corresponding second adversarial generation network can be trained through a second loss function between the second reference sample image and the second style sample image.

It should be noted that in different application scenarios, the second loss function is calculated in different ways, examples are as follows:

In some possible embodiments, the optical flow field from the second reference sample image to the second style sample image can be calculated, that is, the motion optical flow field of the same key point can be calculated from the second reference sample image to the second style sample image , the second loss function is determined based on the motion optical flow field, wherein the motion optical flow field identifies the alignment error between the second reference sample image and the second style sample image, and the larger the optical flow field is, the second reference sample image The larger the error between and the second style sample image.

In some other possible embodiments, in order to improve the calculation efficiency of the second loss function, the second reference sample image is divided into multiple grid blocks, and the first style sample image is also divided according to the same grid division strategy For multiple grid blocks, calculate the pixel mean value of all pixels contained in each grid block, based on the difference between the pixel mean values between the corresponding position grid blocks between the second reference sample image and the second style sample image The value determines the second loss function. For example, the mean of the differences of the pixel mean values between all grids is used as the second loss function.

Step 804, perform backpropagation training on the second adversarial generative network according to the second loss function.

In this embodiment, the second adversarial generation network is trained by direction propagation according to the second loss function, that is, the network parameters of the second adversarial generation network to be trained are adjusted, so that the second adversarial generation network after adjusting the network parameters can output the same A second consistent style of related images.

In another embodiment of the present disclosure, referring to FIG. 9 , the first object key point segmentation detection is performed on the first style sample image, for example, the face key point segmentation detection is performed based on the face analysis technology, and then the first object is obtained. The contour feature mask1 of the key region, after obtaining mask1, encode mask1 to obtain the first encoding result, encode the first style sample image to obtain the second encoding result, and perform fusion based on the first encoding result and the second encoding result Obtain the first feature image, which on the one hand embodies the outline features of the first object on the outline of the key area, and on the other hand, combines the original first style sample image to retain the original first style features .

Furthermore, based on the fusion of the first feature image and mask1, the second feature map is obtained, and the second feature map is input to the first confrontation generation network to obtain the corresponding third reference sample image, and the third reference sample image and the first style sample image are calculated If the loss value is greater than the preset threshold, adjust the network parameters of the first adversarial generation network until the above loss value is less than the preset threshold, then complete the training of the first adversarial generation network.

Similarly, in this embodiment, with reference to FIG. 10 , the second object key point segmentation detection is performed on the second style sample image, for example, the face key point segmentation detection is performed based on the face analysis technology, and then the second object The key area contour feature mask2, after obtaining mask2, encode mask2 to obtain the third encoding result, encode the second style sample image to obtain the fourth encoding result, and obtain after fusion based on the third encoding result and the fourth encoding result The third feature image, on the one hand, the third feature image embodies the outline features of the second object on the outline of the key area, and on the other hand, combines the original second style sample image to retain the original second style features.

Furthermore, based on the fusion of the third feature image and the third encoding result, the fourth feature map is obtained, the fourth feature map is input to the second confrontation generation network to obtain the corresponding fourth reference sample image, and the fourth reference sample image and the second For the loss value between the style sample images, if the loss value is greater than the preset threshold, adjust the network parameters of the second confrontation generation network until the above loss value is less than the preset threshold.

To sum up, the image processing method of the embodiment of the present disclosure can perform the training of the adversarial generation network according to the needs of the scene in combination with the outline characteristics of key regions, and improve the training efficiency of the adversarial generation network on the basis of ensuring the training accuracy of the adversarial generation network. .

Based on the above-mentioned embodiments, the distance between the confrontation generation network and the corresponding positive sample image is considered to calculate the relevant loss function. This calculation method may cause the output image to lack details and be too smooth.

Therefore, in one embodiment of the present disclosure, the adversarial generation network is trained in combination with negative sample images, that is, when calculating the relevant loss function, negative sample images can also be combined. The first style is the oil painting style, and the second style is Take the plain makeup style as an example to illustrate the training process of the adversarial generative network.

In this embodiment, as shown in FIG. 11, the first loss function is determined according to the first style sample image and the first reference sample image, including:

Step 1101, performing fusion and noise addition processing on the first style sample image and the first reference sample image to generate a first negative sample image.

In this embodiment, after the first style sample image and the first reference sample image are fused, random noise may be added to the fused image to obtain the first negative sample image. Compared with the first style sample image, the first negative sample image not only introduces an error of the first reference sample image, but also introduces a noise error.

Step 1102, extract the first high-frequency information of the first style sample image, the second high-frequency information of the first reference sample image, and the third high-frequency information of the first negative sample image.

In this embodiment, the first high-frequency information of the first style sample image, the second high-frequency information of the first reference sample image, and the third high-frequency information of the first negative sample image are extracted, wherein the high-frequency information of the image The information can be understood as pixel information of a pixel with a large brightness difference and richer details, and the like.

In step 1103, the discriminant network in the first adversarial generation network performs discriminative processing on the first high-frequency information, the second high-frequency information, and the third high-frequency information to generate corresponding discriminant scores.

In this embodiment, the first high-frequency information, the second high-frequency information, and the third high-frequency information are discriminated by the discriminant network in the first adversarial generation network to generate corresponding discriminant scores, which represent the discriminant The score of the first style belongs to the first high-frequency information, the second high-frequency information and the third high-frequency information.

Step 1104, determine a first loss function according to the discriminant score.

In this embodiment, the first loss function is determined according to the discriminant score, for example, directly calculating the first high-frequency information and the second high-frequency information of the first reference sample image, and the third high-frequency information of the first negative sample image For the first absolute square error value and the second absolute square error value of the high-frequency information, a ratio of the first absolute square error value to the second absolute square error value is calculated as a first loss function.

Or directly calculate the first difference between the first high-frequency information and the second high-frequency information of the first reference sample image, and the third high-frequency information of the first negative sample image, and the second difference, A ratio of the first difference to the second difference is calculated as a first loss function.

Therefore, in this embodiment, while ensuring the training of the first adversarial generation network, the image output by the trained first adversarial generation network is not only close to the first style sample image on the feature level, but also similar to the first negative sample image Far away, thereby reducing the introduction of some artifacts and noise, and ensuring that the image output by the first confrontation generation network is an oil painting style.

Similarly, in this embodiment, as shown in Figure 12, the second loss function is determined according to the second style sample image and the second reference sample image, including:

Step 1201, performing fusion and noise addition processing on the second style sample image and the second reference sample image to generate a second negative sample image.

In this embodiment, after the second style sample image and the second reference sample image are fused, random noise may be added to the fused image to obtain a second negative sample image. Compared with the second style sample image, the second negative sample image not only introduces an error of the second reference sample image, but also introduces a noise error.

Step 1202, extracting the first texture feature of the second style sample image, the second texture feature of the second reference sample image, and the third texture feature of the second negative sample image.

In this embodiment, the first texture feature of the second style sample image, the second texture feature of the second reference sample image, and the third texture feature of the second negative sample image are extracted, wherein the texture feature reflects the corresponding image Features such as color and brightness of pixels belonging to plain makeup style.

Step 1203: Determine a second loss function according to the first texture feature, the second texture feature, and the third texture feature.

In this embodiment, the discriminant network in the second adversarial generation network can be used to discriminate the first texture feature, the second texture feature, and the third texture feature to generate corresponding discriminant scores. The first texture feature, the second texture feature and the third texture feature belong to the score of the second style.

Furthermore, the second loss function can be determined according to the discriminant score, for example, directly calculate the first texture feature and the second texture feature of the second reference sample image respectively, and the third square absolute of the third texture feature of the second negative sample image The error value, and the fourth squared absolute error value, calculating the ratio of the third squared absolute error value and the fourth squared absolute error value as the second loss function.

Alternatively, directly calculate the first texture feature and the third difference between the second texture feature and the third texture feature, and the fourth difference, and calculate the ratio of the third difference to the fourth difference as the second loss function .

Therefore, in this embodiment, while ensuring the training of the second adversarial generation network, the image output by the trained second adversarial generation network is not only close to the second style sample image on the feature level, but also similar to the second negative sample image away, thereby reducing the introduction of some artifacts and noise, and ensuring that the image output by the second adversarial generation network is a plain style.

Further, when performing image style conversion processing on the images of the first style and the second style according to the style conversion network, the key area contour features of the target object in the original image with plain makeup style can be extracted first, wherein the target object includes but is not limited to The various parts of the face mentioned above, etc., and then encode the key area contour features of the original image without makeup style and the target object to generate the feature data of the target object.

Furthermore, after obtaining the feature data of the target object, since the pre-trained style transfer network contains the network characteristics of the second confrontation network, the feature data of the target object and the key region outline of the target object can be further analyzed by the style transfer network The image fusion processing is performed on the features, and the conversion of the oil painting style field is performed based on the fused image, and the target image with the oil painting style is obtained.

It can also be understood that the pre-trained first adversarial generation network can extract the corresponding key contour features of the target object reflecting the characteristics of the plain face style based on the input original image of the plain face style, and then, the original image of the plain face style and the target object in the plain face The key area contour features of the dimension are encoded to generate the feature data of the target object, and the image fusion processing is performed on the feature data of the target object and the key area contour features of the target object to obtain a new original image of the key area contour dimension with plain makeup style.

Based on the new original image input to the pre-trained second adversarial generation network, the second adversarial generation network extracts the key contour features of the target object in the oil painting style dimension of the new original image, and then, the new style original image and target The key area contour features of the oil painting style dimension of the object are encoded to generate new feature data of the target object. Since the second adversarial generation network can obtain an image of the second style based on the feature input of the input image, the second adversarial generation network is based on The new feature data obtains the target image of the oil painting style, in which the weight of the style conversion network acting on the first confrontation generation network and the second confrontation generation network is reflected in the product of the output results of each confrontation generation network. For details, please refer to the above The embodiment will not be repeated here.

Therefore, as shown in Figure 13, if the input is a plain face image, through the style conversion network in this embodiment, the output fusion of the first confrontation generation network and the second confrontation generation network can be combined to obtain a corresponding oil painting style image. Oil painting-style image details are rich and realistic.

To sum up, the image processing method of the embodiment of the present disclosure combines the distance between the input style sample image and the positive sample image and the negative sample image respectively, and trains the loss value at the feature level to obtain the corresponding adversarial generation network, while ensuring the output of the adversarial generation network. On the basis of the richness of the image details of the relevant image, the purity of the output image is improved, so that the style conversion effect of the fused target image is consistent with the second style.

In order to realize the above-mentioned embodiments, the present disclosure also proposes an image processor.

FIG. 14 is a schematic structural diagram of an image processor provided by an embodiment of the present disclosure. The device can be implemented by software and/or hardware, and can generally be integrated into an electronic device for image processing. As shown in Figure 14, the device includes:

The first training module 1610 is used to obtain the first object feature of the first style sample image, and train the first confrontation generation network according to the first object feature and the first style sample image;

The second training module 1620 is used to obtain the second object feature of the second style sample image, and train the second confrontation generating network according to the second object feature and the second style sample image;

The fusion module 1630 is configured to perform fusion processing on the first adversarial generation network and the second adversarial generation network to generate a style conversion network, so as to perform image style conversion processing on images of the first style and the second style according to the style conversion network. The image processor provided by the embodiment of the present disclosure can execute the image processing method provided by any embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.

In order to implement the above embodiments, the present disclosure also proposes a computer program product, including computer programs/instructions, which implement the image processing methods in the above embodiments when the computer programs/instructions are executed by a processor.

Referring specifically to FIG. 15 , it shows a schematic structural diagram of an electronic device 1700 suitable for implementing an embodiment of the present disclosure. The electronic device 1700 in the embodiment of the present disclosure may include, but not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), vehicle-mounted terminals ( Mobile terminals such as car navigation terminals) and stationary terminals such as digital TVs, desktop computers and the like. The electronic device shown in FIG. 15 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.

As shown in FIG. 15 , an electronic device 1700 may include a processor (such as a central processing unit, a graphics processing unit, etc.) 1701, which may be stored in a read-only memory (ROM) 1702 or loaded into a random access memory from a memory 1708. (RAM) 1703 to execute various appropriate actions and processing. In the RAM 1703, various programs and data necessary for the operation of the electronic device 1700 are also stored. The processor 1701, ROM 1702, and RAM 1703 are connected to each other through a bus 1704. An input/output (I/O) interface 1705 is also connected to the bus 1704 .

Typically, the following devices can be connected to the I/O interface 1705: input devices 1706 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 1707 such as a computer; a memory 1708 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 1709. The communication means 1709 may allow the electronic device 1700 to perform wireless or wired communication with other devices to exchange data. While FIG. 15 shows electronic device 1700 having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer readable medium, the computer program including program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 1709, or from memory 1708, or from ROM 1702. When the computer program is executed by the processor 1701, the above-mentioned functions defined in the image processing method of the embodiment of the present disclosure are executed.

It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.

In some embodiments, the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium The communication (eg, communication network) interconnections. Examples of communication networks include local area networks ("LANs"), wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.

The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: acquires the first object feature of the first style sample image, according to the first object feature and The first style sample image trains the first adversarial generation network, and then obtains the second object feature of the second style sample image, trains the second adversarial generation network according to the second object feature and the second style sample image, and the first adversarial generation network performing fusion processing with the second adversarial generation network to generate a style conversion network, so as to perform image style conversion processing on the images of the first style and the second style according to the style conversion network. As a result, the processing power requirements for sample images during image style conversion are reduced, and the training efficiency of the style conversion network is improved on the premise of ensuring the effect of style conversion.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of a unit does not constitute a limitation of the unit itself under certain circumstances.

The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.

In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, the present disclosure provides an image processing method, including: acquiring a first object feature of a first style sample image, and according to the first object feature and the first style sample image Train the first adversarial generative network;

Obtaining a second object feature of a second style sample image, and training a second adversarial generation network according to the second object feature and the second style sample image;

performing fusion processing on the first confrontational generation network and the second confrontational generation network to generate a style conversion network, so as to perform image style conversion on images of the first style and the second style according to the style conversion network deal with.

According to one or more embodiments of the present disclosure, in the image processing method provided by the present disclosure, the acquisition of the first object feature of the first style sample image is performed according to the first object feature and the first style sample image training The first adversarial generative network, including:

performing key point segmentation and detection on the first object in the first style sample image, and extracting key area contour features of the first object;

Processing the key area contour features of the first object through the generation network in the first confrontation generation network to be trained to generate a first reference sample image;

determining a first loss function based on the first style sample image and the first reference sample image;

performing backpropagation training on the first adversarial generation network according to the first loss function.

According to one or more embodiments of the present disclosure, in the image processing method provided by the present disclosure,

According to one or more embodiments of the present disclosure, in the image processing method provided by the present disclosure, the acquisition of the second object features of the second style sample image is performed according to the second object features and the second style sample image training The second adversarial generative network, including:

Carry out key point segmentation and detection to the second object in the second style sample image, and extract the key area contour features of the second object;

Processing the key area contour features of the second object through the generation network in the second confrontation generation network to be trained to generate a second reference sample image;

determining a second loss function based on the second style sample image and the second reference sample image;

performing backpropagation training on the second adversarial generation network according to the second loss function.

According to one or more embodiments of the present disclosure, in the image processing method provided by the present disclosure, the fusion processing of the first confrontation generation network and the second confrontation generation network to generate a style conversion network includes:

determining a first weight corresponding to the first adversarial generation network and a second weight corresponding to the second adversarial generation network according to the similarity between the first object feature and the second object feature;

Obtaining the first product result of the output image of the first confrontation generating network and the first weight, and obtaining the second product result of the output image of the second confrontation generating network and the second weight, the The first product result and the second product result are fused to generate the style transfer network.

According to one or more embodiments of the present disclosure, in the image processing method provided by the present disclosure, the first style is an oil painting style, and the second style is a plain makeup style;

The determining a first loss function according to the first style sample image and the first reference sample image includes:

performing fusion and noise processing on the first style sample image and the first reference sample image to generate a first negative sample image;

extracting first high-frequency information of the first style sample image, second high-frequency information of the first reference sample image, and third high-frequency information of the first negative sample image;

performing discrimination processing on the first high-frequency information, the second high-frequency information, and the third high-frequency information through the discrimination network in the first confrontation generation network to generate corresponding discrimination scores;

The first loss function is determined based on the discriminant score.

According to one or more embodiments of the present disclosure, in the image processing method provided in the present disclosure, the determining the second loss function according to the second style sample image and the second reference sample image includes:

Perform fusion and noise processing on the second style sample image and the second reference sample image to generate a second negative sample image;

extracting a first texture feature of the second style sample image, a second texture feature of the second reference sample image, and a third texture feature of the second negative sample image;

The second loss function is determined according to the first texture feature, the second texture feature, and the third texture feature.

According to one or more embodiments of the present disclosure, in the image processing method provided by the present disclosure, performing image style conversion processing on images of the first style and the second style according to the style conversion network includes:

Extract key region contour features of the target object in the original image with the plain makeup style;

Encoding the original image of plain makeup style and the outline features of key regions of the target object to generate feature data of the target object;

performing image fusion processing on the feature data of the target object and the key area contour features of the target object through the style conversion network to generate a target image with the oil painting style.

According to one or more embodiments of the present disclosure, the present disclosure provides an image processor, including:

The first training module is used to obtain the first object feature of the first style sample image, and train the first confrontation generation network according to the first object feature and the first style sample image;

The second training module is used to obtain the second object feature of the second style sample image, and train the second confrontation generation network according to the second object feature and the second style sample image;

A fusion module, configured to perform fusion processing on the first confrontational generation network and the second confrontational generation network, and generate a style conversion network, so as to perform a fusion process on the first style and the second style according to the style conversion network. The image undergoes image style conversion processing.

According to one or more embodiments of the present disclosure, in the image processor provided by the present disclosure, the first training module is specifically used for:

According to one or more embodiments of the present disclosure, in the image processor provided by the present disclosure, the second training module is specifically used for:

performing key point segmentation detection on the second object in the second style sample image, and extracting key area contour features of the second object;

Process the key area contour features of the second object to generate the second reference sample image by the generation network in the second confrontation generation network to be trained;

According to one or more embodiments of the present disclosure, in the image processor provided by the present disclosure, the fusion module is specifically used for:

According to one or more embodiments of the present disclosure, in the image processor provided by the present disclosure, the first style is an oil painting style, and the second style is a plain makeup style; the fusion module is specifically used for:

The first loss function is determined based on the discriminant score.

According to one or more embodiments of the present disclosure, the present disclosure provides an electronic device, including:

processor;

memory for storing said processor-executable instructions;

The processor is configured to read the executable instructions from the memory, and execute the instructions to implement any image processing method provided in the present disclosure.

According to one or more embodiments of the present disclosure, the present disclosure provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to execute any image provided by the present disclosure. Approach.

The above description is only a preferred embodiment of the present disclosure and an illustration of the applied technical principles. Those skilled in the art should understand that the disclosure scope involved in this disclosure is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, but also covers the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of equivalent features. For example, a technical solution formed by replacing the above-mentioned features with (but not limited to) technical features with similar functions disclosed in this disclosure.

In addition, while operations are depicted in a particular order, this should not be understood as requiring that the operations be performed in the particular order shown or performed in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims

An image processing method, comprising:

Obtaining a first object feature of the first style sample image, and training a first confrontation generation network according to the first object feature and the first style sample image;

Obtaining a second object feature of a second style sample image, and training a second adversarial generative network according to the second object feature and the second style sample image; and

performing fusion processing on the first confrontational generation network and the second confrontational generation network to generate a style conversion network, so as to perform image style conversion on images of the first style and the second style according to the style conversion network deal with.
The method according to claim 1, wherein said acquiring first object features of a first style sample image, and training a first adversarial generation network according to said first object feature and said first style sample image, comprises:

performing key point segmentation and detection on the first object in the first style sample image, and extracting key area contour features of the first object;

Processing the key area contour features of the first object through the generation network in the first confrontation generation network to be trained to generate a first reference sample image;

determining a first loss function based on the first style sample image and the first reference sample image; and

performing backpropagation training on the first adversarial generation network according to the first loss function.
The method according to claim 2, wherein said acquiring the second object features of the second style sample image, and training a second adversarial generation network according to the second object feature and the second style sample image include:

performing key point segmentation detection on the second object in the second style sample image, and extracting key area contour features of the second object;

Processing the key area contour features of the second object through the generation network in the second confrontation generation network to be trained to generate a second reference sample image;

determining a second loss function based on the second style sample image and the second reference sample image; and

performing backpropagation training on the second adversarial generation network according to the second loss function.
The method according to any one of claims 1-3, wherein the fusion processing of the first confrontation generation network and the second confrontation generation network to generate a style transfer network includes:

determining a first weight corresponding to the first adversarial generation network and a second weight corresponding to the second adversarial generation network according to the similarity between the first object feature and the second object feature; and

Obtaining the first product result of the output image of the first confrontation generating network and the first weight, and obtaining the second product result of the output image of the second confrontation generating network and the second weight, the The first product result and the second product result are fused to generate the style transfer network.
The method according to any one of claims 3-4, wherein the first style is an oil painting style, and the second style is a plain makeup style;

The determining a first loss function according to the first style sample image and the first reference sample image includes:

performing fusion and noise processing on the first style sample image and the first reference sample image to generate a first negative sample image;

extracting first high-frequency information of the first style sample image, second high-frequency information of the first reference sample image, and third high-frequency information of the first negative sample image;

performing discrimination processing on the first high-frequency information, the second high-frequency information, and the third high-frequency information through the discrimination network in the first confrontation generation network to generate corresponding discrimination scores; and

The first loss function is determined based on the discriminant score.
The method according to claim 5, wherein said determining a second loss function according to said second style sample image and said second reference sample image comprises:

Perform fusion and noise processing on the second style sample image and the second reference sample image to generate a second negative sample image;

extracting a first texture feature of the second style sample image, a second texture feature of the second reference sample image, and a third texture feature of the second negative sample image; and

The second loss function is determined according to the first texture feature, the second texture feature, and the third texture feature.
The method according to claim 6, wherein the performing image style conversion processing on the images of the first style and the second style according to the style conversion network comprises:

Extract key region contour features of the target object in the original image with the plain makeup style;

Encoding the original image without makeup style and the contour features of key regions of the target object to generate feature data of the target object; and

performing image fusion processing on the feature data of the target object and the key area contour features of the target object through the style conversion network to generate a target image with the oil painting style.
An image processor comprising:

The first training module is configured to obtain a first object feature of a first style sample image, and train a first confrontation generation network according to the first object feature and the first style sample image;

The second training module is configured to obtain a second object feature of a second style sample image, and train a second adversarial generation network according to the second object feature and the second style sample image; and

A fusion module configured to perform fusion processing on the first confrontational generation network and the second confrontational generation network to generate a style conversion network, so as to analyze the first style and the second style according to the style conversion network The image is subjected to image style conversion processing.
An electronic device comprising:

processor; and

a memory configured to store said processor-executable instructions;

The processor is configured to read the executable instructions from the memory, and execute the instructions to implement the image processing method described in any one of claims 1-7 above.
A computer-readable storage medium, the computer-readable storage medium storing a computer program configured to execute the image processing method described in any one of claims 1-7 above.
A computer program product, the computer program product comprising a computer program/instruction, when the computer program/instruction is executed by a processor, the method described in any one of claims 1-9 above is realized.